Skip to content

an end-to-end LangGraph Agent System that autonomously discovers, enriches, scores, and contacts B2B prospects

Notifications You must be signed in to change notification settings

swatv3nub/LeadSync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LangGraph Outbound Lead Generation Workflow

A sophisticated, AI-powered outbound lead generation system built with LangGraph 1.0.0 that automates the entire process from prospect discovery to personalized outreach and performance optimization β€” with ChromaDB memory persistence that enables continuous learning and prevents duplicate outreach.

🎯 Overview

This project implements an intelligent, self-improving lead generation system that:

  • βœ… Discovers and enriches B2B prospects from multiple sources
  • βœ… Generates personalized, AI-powered email outreach
  • βœ… Tracks engagement (opens, clicks, replies) in real-time
  • βœ… Learns from historical performance to improve future campaigns
  • βœ… Prevents duplicate outreach using memory persistence
  • βœ… Provides human-in-the-loop approval gates for quality control

πŸš€ Key Features

  • πŸ” Intelligent Prospect Discovery: Multi-source lead enrichment using Apollo.io, Clay, and Clearbit
  • βœ‰οΈ Personalized Email Outreach: AI-generated, context-aware email campaigns with GPT-4o-mini
  • πŸ“Š Response Tracking: Real-time monitoring of opens, clicks, and replies via SendGrid
  • 🧠 Continuous Learning: FeedbackTrainer analyzes performance and suggests data-driven improvements
  • πŸ’Ύ Memory Persistence: ChromaDB-based storage prevents duplicate outreach and enables historical learning
  • πŸ”„ Human-in-the-Loop: Manual approval gates for lead selection and email content
  • πŸ“ˆ Performance Analytics: Detailed metrics and trend analysis across campaigns
  • πŸ“ Organized Export: Prospects automatically saved to organized folders (CSV/JSON/Excel) with timestamps

πŸ—οΈ Architecture

The system uses LangGraph 1.0.0 to orchestrate a multi-agent workflow with ChromaDB memory persistence:

ProspectSearchAgent β†’ OutreachExecutorAgent β†’ ResponseTrackerAgent β†’ FeedbackTrainerAgent
        ↓                       ↓                      ↓                      ↓
   [Find Leads]          [Send Emails]          [Track Responses]      [Optimize]
        ↓                       ↓                      ↓                      ↓
  Human Approval         Human Approval         Auto (with memory)    Store Recommendations
        ↓                       ↓                      ↓                      ↓
   Deduplicate           Log Interactions       Track Engagement      Historical Learning

🧠 Memory Persistence Layer

ChromaDB-based memory system with 4 collections:

WorkflowMemory (ChromaDB)
β”œβ”€β”€ leads_collection          # All discovered and contacted leads (deduplication)
β”œβ”€β”€ campaigns_collection       # Campaign execution history and metrics
β”œβ”€β”€ interactions_collection    # Email interactions (opened, clicked, replied)
└── recommendations_collection # AI-generated improvement suggestions

Memory enables:

  • βœ… Lead deduplication (never contact the same person twice)
  • βœ… Performance tracking (monitor metrics across campaigns)
  • βœ… Historical learning (FeedbackTrainer uses past data to improve)
  • βœ… Full audit trail (complete interaction history)

See docs/MEMORY.md for complete memory documentation.

πŸ€– Agent Architecture

4 specialized agents working collaboratively:

  1. ProspectSearchAgent: Discovers prospects using multiple APIs

    • Sources: Apollo.io, Clay, Clearbit
    • Memory integration: Automatically deduplicates leads before returning
    • Output: Enriched lead profiles
  2. OutreachExecutorAgent: Generates and sends personalized emails

    • AI-powered personalization using GPT-4o-mini
    • SendGrid delivery with rate limiting
    • Memory integration: Logs "email_sent" interactions
  3. ResponseTrackerAgent: Monitors email engagement

    • Tracks opens, clicks, replies in real-time
    • Calculates campaign metrics
    • Memory integration: Logs all engagement interactions
  4. FeedbackTrainerAgent: Analyzes performance and optimizes

    • Compares current performance to historical trends
    • Learns from best-performing campaigns
    • Memory integration: Retrieves trends, stores recommendations

πŸ“‹ Prerequisites

  • Python 3.9+
  • API Keys for:
    • OpenAI (required for AI agents)
    • Apollo.io (for prospect search - free tier available)
    • Clay (optional - free trial)
    • Clearbit or PeopleDataLabs (optional - for enrichment)
    • SendGrid (for email delivery - free tier available)
    • Google Cloud (optional - for Sheets logging)

πŸš€ Quick Start

1. Clone and Setup

# Navigate to project directory
cd x:\Project\ProspectToLead

# Create virtual environment
python -m venv venv

# Activate virtual environment
.\venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

2. Configure Environment

# Copy example environment file
copy .env.example .env

# Edit .env with your API keys
notepad .env

Minimum required configuration:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
APOLLO_API_KEY=your_apollo_api_key_here
SENDGRID_API_KEY=your_sendgrid_api_key_here
SENDGRID_FROM_EMAIL=your_email@company.com

3. Run the Workflow

# Execute the full workflow
python langgraph_builder.py

# With custom config
python langgraph_builder.py --config ./workflow.json

# Generate workflow visualization
python langgraph_builder.py --visualize

4. View Found Prospects

After the workflow runs, prospects are automatically stored in memory and can be viewed/exported:

# View prospects in console
python scripts\view_prospects.py

# Export to CSV (saved to prospects/csv/ folder)
python scripts\view_prospects.py --csv

# Export to JSON (saved to prospects/json/ folder)
python scripts\view_prospects.py --json

# Export to Excel (saved to prospects/xlsx/ folder)
python scripts\view_prospects.py --xlsx

# Export all formats at once
python scripts\view_prospects.py --csv --json --xlsx

Exported files are organized by format:

  • prospects/csv/prospects_20251019_143025.csv - Spreadsheet format
  • prospects/json/prospects_20251019_143025.json - Structured data format
  • prospects/xlsx/prospects_20251019_143025.xlsx - Excel format with formatting

5. View Memory Statistics

Check campaign history and memory statistics:

# View all memory statistics
python scripts\view_memory.py --all

# View specific information
python scripts\view_memory.py --stats       # Overall statistics
python scripts\view_memory.py --campaigns   # Recent campaigns
python scripts\view_memory.py --trends      # Performance trends

# Check if a lead exists
python scripts\view_memory.py --check-lead john.doe@company.com

6. Clear Memory (Optional)

Remove data from memory when needed:

# Show current memory statistics
python scripts\clear_memory.py --stats

# Clear only prospects/leads
python scripts\clear_memory.py --leads

# Clear all data (with confirmation)
python scripts\clear_memory.py --all

7. Interactive Feedback & Recommendation Review

Provide feedback on campaign performance and approve AI-generated recommendations:

# Interactive menu
python scripts\interactive_feedback.py

# List all campaigns
python scripts\interactive_feedback.py --list-campaigns

# Analyze a specific campaign with interactive prompts
python scripts\interactive_feedback.py --analyze campaign_20251019_143025

# View pending recommendations
python scripts\interactive_feedback.py --view-pending

# Review and approve/reject pending recommendations
python scripts\interactive_feedback.py --approve-pending

# Force clear without confirmation (use with caution!)
python scripts\clear_memory.py --all --force

⚠️ Note: Cleared data cannot be recovered. Export prospects first if needed:

python scripts\view_prospects.py --csv --json
python scripts\clear_memory.py --leads

See docs/CLEAR_MEMORY.md for detailed documentation.

πŸ“ Project Structure

ProspectToLead/
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ __init__.py              # Agent factory
β”‚   β”œβ”€β”€ base_agent.py            # Base agent with ReAct pattern
β”‚   β”œβ”€β”€ prospect_search_agent.py # Prospect discovery with memory deduplication
β”‚   β”œβ”€β”€ enrichment_agent.py      # Data enrichment
β”‚   β”œβ”€β”€ scoring_agent.py         # Lead scoring
β”‚   β”œβ”€β”€ outreach_content_agent.py # Content generation
β”‚   β”œβ”€β”€ outreach_executor_agent.py # Email sending with interaction logging
β”‚   β”œβ”€β”€ response_tracker_agent.py # Response tracking with memory
β”‚   └── feedback_trainer_agent.py # Performance analysis with historical learning
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ config.py                # Configuration loader
β”‚   β”œβ”€β”€ logger.py                # Logging utilities
β”‚   β”œβ”€β”€ llm.py                   # OpenAI GPT-4o-mini integration
β”‚   β”œβ”€β”€ memory.py                # ChromaDB memory persistence layer
β”‚   └── tools.py                 # API client integrations
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ check_config.py          # Configuration validation
β”‚   β”œβ”€β”€ view_memory.py           # Memory statistics and querying
β”‚   β”œβ”€β”€ view_prospects.py        # View and export prospects
β”‚   └── clear_memory.py          # Clear data from memory
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ MEMORY.md                # Memory system documentation
β”‚   β”œβ”€β”€ CLEAR_MEMORY.md          # Clear memory documentation
β”‚   β”œβ”€β”€ BUGFIX_UNICODE.md        # Unicode encoding fix documentation
β”‚   β”œβ”€β”€ PROJECT_SUMMARY.md       # Project overview
β”‚   β”œβ”€β”€ QUICKSTART.md            # Quick start guide
β”‚   └── SETUP.md                 # Detailed setup instructions
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ conftest.py              # Pytest fixtures
β”‚   β”œβ”€β”€ test_agents.py           # Agent unit tests
β”‚   └── test_workflow.py         # End-to-end workflow tests
β”œβ”€β”€ data/
β”‚   └── chroma/                  # ChromaDB memory storage (auto-created)
β”œβ”€β”€ logs/
β”‚   └── workflow.log             # Workflow execution logs
β”œβ”€β”€ prospects/                   # Exported prospect data (organized by format)
β”‚   β”œβ”€β”€ README.md                # Prospects folder documentation
β”‚   β”œβ”€β”€ csv/                     # CSV exports with timestamps
β”‚   β”œβ”€β”€ json/                    # JSON exports with timestamps
β”‚   └── xlsx/                    # Excel exports with timestamps
β”œβ”€β”€ langgraph_builder.py         # Main workflow builder
β”œβ”€β”€ workflow.json                # Workflow configuration
β”œβ”€β”€ workflow_simple.json         # Simplified workflow config
β”œβ”€β”€ demo.py                      # Example workflow execution
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ .env.example                 # Example environment config
β”œβ”€β”€ .gitignore                   # Git ignore rules
└── README.md                    # This file

βš™οΈ Workflow Configuration

The workflow.json file defines the entire workflow. Each step includes:

  • id: Unique identifier
  • agent: Agent class name
  • inputs: Input parameters (can reference previous steps)
  • instructions: Natural language instructions for the agent
  • tools: API configurations
  • output_schema: Expected output structure
  • next: Next step in the workflow (or null for final step)

Example Step Configuration

{
  "id": "prospect_search",
  "agent": "ProspectSearchAgent",
  "inputs": {
    "icp": {
      "industry": "SaaS",
      "location": "USA",
      "employee_count": { "min": 100, "max": 1000 }
    }
  },
  "instructions": "Search for companies matching ICP criteria",
  "tools": [
    {
      "name": "ApolloAPI",
      "config": { "api_key": "{{APOLLO_API_KEY}}" }
    }
  ],
  "next": "enrichment"
}

Input References

Steps can reference outputs from previous steps:

{
  "inputs": {
    "leads": "{{prospect_search.output.leads}}",
    "workflow_config": "{{workflow}}"
  }
}

πŸ”§ API Setup Guide

OpenAI API

  1. Sign up at https://platform.openai.com
  2. Create an API key
  3. Add to .env: OPENAI_API_KEY=sk-...

Apollo.io

  1. Sign up at https://www.apollo.io (free tier available)
  2. Get API key from Settings β†’ Integrations
  3. Add to .env: APOLLO_API_KEY=...

SendGrid

  1. Sign up at https://sendgrid.com (free tier: 100 emails/day)
  2. Create API key with Mail Send permissions
  3. Verify sender email
  4. Add to .env:
    SENDGRID_API_KEY=SG...
    SENDGRID_FROM_EMAIL=verified@yourdomain.com
    

Google Sheets (Optional)

  1. Create a Google Cloud project
  2. Enable Google Sheets API
  3. Create service account and download credentials.json
  4. Place credentials.json in project root
  5. Create a Google Sheet and share with service account email
  6. Add Sheet ID to .env: GOOGLE_SHEET_ID=...

πŸ“Š Output and Results

Console Output

The workflow provides real-time progress updates:

πŸš€ Starting workflow: OutboundLeadGeneration
βœ“ Created agent: ProspectSearchAgent
πŸ€” ProspectSearchAgent reasoning: Analyzing ICP criteria...
βœ… Completed step: prospect_search (2.3s)
...
πŸŽ‰ Workflow completed in 45.2s

Results File

Complete results are saved to workflow_results.json:

{
  "status": "completed",
  "outputs": {
    "prospect_search": { "leads": [...] },
    "enrichment": { "enriched_leads": [...] },
    "scoring": { "ranked_leads": [...] },
    ...
  }
}

Feedback and Recommendations

The FeedbackTrainer provides actionable insights:

Campaign Performance Summary
===========================
πŸ“§ Total Sent: 20
πŸ“– Open Rate: 35.0%
πŸ‘† Click Rate: 8.5%
πŸ’¬ Reply Rate: 4.2%

πŸ“‹ Recommendations:
β€’ outreach_content - tone
  Open rate is 35%, above average. Continue with current approach.
β€’ scoring - min_score_threshold
  High engagement suggests we can expand targeting.

πŸ§ͺ Testing and Development

Dry Run Mode

Test without sending actual emails:

# In workflow.json, set dry_run: true
{
  "id": "send",
  "inputs": {
    "dry_run": true
  }
}

Mock Data

Agents automatically use mock data when APIs are unavailable, allowing development without all API keys.

Logging

Logs are written to ./logs/workflow.log with detailed execution traces.

πŸ”„ Extending the System

Adding a New Agent

  1. Create agent class in agents/:
from agents.base_agent import BaseAgent

class MyCustomAgent(BaseAgent):
    def _act(self, inputs, reasoning):
        # Your logic here
        return {"result": "success"}
  1. Register in factory (agents/__init__.py):
AGENT_CLASSES = {
    ...
    "MyCustomAgent": MyCustomAgent
}
  1. Add to workflow.json:
{
  "id": "my_step",
  "agent": "MyCustomAgent",
  "inputs": {...},
  "instructions": "...",
  "next": "next_step"
}

Modifying Workflow Logic

Edit workflow.json to:

  • Change ICP criteria
  • Adjust scoring weights
  • Modify outreach tone
  • Add/remove workflow steps
  • Change step sequencing

No code changes required!

πŸ“ˆ Performance Optimization

Rate Limiting

Configure delays between API calls:

{
  "inputs": {
    "send_delay_seconds": 60
  }
}

Batch Processing

Limit prospects per run:

{
  "inputs": {
    "max_results": 50,
    "max_leads_to_contact": 20
  }
}

Caching

The system automatically caches enrichment data in the state object.

πŸ› Troubleshooting

Common Issues

Import errors for langgraph/langchain:

pip install --upgrade langgraph langchain langchain-openai

API authentication failures:

  • Verify API keys in .env
  • Check API key permissions/scopes
  • Ensure API quotas not exceeded

Google Sheets errors:

  • Verify credentials.json exists
  • Check service account email has edit access to sheet
  • Confirm Sheet ID is correct

Email sending failures:

  • Verify sender email in SendGrid
  • Check SendGrid API key permissions
  • Ensure not exceeding rate limits

πŸ“ Best Practices

  1. Start with dry run to test workflow without sending emails
  2. Use mock data during development to avoid API costs
  3. Monitor API quotas to avoid service interruptions
  4. Review feedback regularly to optimize performance
  5. Version control your workflow.json configurations
  6. Rotate API keys periodically for security

🀝 Contributing

This is a demonstration project. To extend or customize:

  1. Fork the repository
  2. Create a feature branch
  3. Implement changes
  4. Test thoroughly
  5. Submit pull request with clear description

πŸ“„ License

This project is provided as-is for demonstration purposes.

πŸ“§ Contact

For questions about this implementation:

πŸ“š Additional Documentation

For more detailed information, see the docs folder:

πŸ“ Prospects Export

The prospects/ folder contains exported lead data organized by format:

  • prospects/csv/ - CSV files with timestamp (e.g., prospects_20251019_143025.csv)
  • prospects/json/ - JSON files with timestamp (e.g., prospects_20251019_143025.json)
  • prospects/xlsx/ - Excel files with timestamp (e.g., prospects_20251019_143025.xlsx)

Use python scripts\view_prospects.py --help for export options.

πŸŽ₯ Demo Video

A demo video walkthrough is available showing:

  • Complete workflow execution
  • Agent reasoning and decision-making
  • Performance analysis and recommendations
  • Architecture and design choices

Built with: LangGraph, LangChain, OpenAI GPT-4, Python 3.9+

Targets: B2B SaaS companies, $20M-$200M revenue, 100-1000 employees, USA

Approach: AI-assisted development using Cursor and Claude (vibe coding encouraged!)

About

an end-to-end LangGraph Agent System that autonomously discovers, enriches, scores, and contacts B2B prospects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages