Skip to content

Latest commit

 

History

History
148 lines (105 loc) · 2.97 KB

File metadata and controls

148 lines (105 loc) · 2.97 KB

Quick Start Guide

Get up and running in 5 minutes!

1️⃣ One-Command Setup (Recommended)

cd ai-web-crawler-bootcamp
./setup.sh

This will:

  • ✅ Check Python version
  • ✅ Create virtual environment
  • ✅ Install all dependencies
  • ✅ Install Playwright browsers
  • ✅ Create configuration files

2️⃣ Add Your API Key

Edit the .env file:

nano .env

Add your API key (choose one):

For OpenAI:

OPENAI_API_KEY=sk-your-actual-key-here
LLM_MODEL=gpt-4-turbo-preview

For Azure OpenAI:

AZURE_API_KEY=your-key-here
AZURE_API_BASE=https://your-resource.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview
LLM_MODEL=azure/gpt-4

For Anthropic Claude:

ANTHROPIC_API_KEY=your-key-here
LLM_MODEL=claude-3-opus-20240229

Save and exit (Ctrl+X, then Y, then Enter)

3️⃣ Run the Application

# Make sure virtual environment is activated
source venv/bin/activate

# Start the web interface
streamlit run app.py

Your browser should open automatically to http://localhost:8501

If not, open it manually.

4️⃣ Try the Example

In the web interface:

  1. Go to the Search-Based tab
  2. Enter: Software development consultancy finland
  3. Number of results: 5
  4. Click Start Search-Based Crawl
  5. Wait 5-10 minutes (AI is working!)
  6. View results and download reports

Or run from command line:

python orchestrator.py

This runs the example automatically.

✅ Verify Installation

Test that everything works:

python -c "import litellm, playwright, streamlit; print('✅ All packages installed')"

🎯 What You Should See

  1. Web Interface: Clean Streamlit UI with two tabs
  2. Search Tab: Input field for search term and number selector
  3. CSV Tab: File upload area with sample download
  4. Sidebar: Configuration status and about info

📊 First Results

After your first crawl:

  1. Results Table: Shows all companies with their values
  2. Download Buttons: Get Excel or CSV
  3. Company Details: Expand to see individual analysis
  4. Files Created: Check ./reports/ directory

🐛 Quick Troubleshooting

Virtual environment not activated?

source venv/bin/activate

Port 8501 already in use?

streamlit run app.py --server.port 8502

Import errors?

pip install -r requirements.txt

Playwright browser missing?

playwright install chromium

📚 Next Steps

  • ✅ Read README.md for full documentation
  • ✅ Check EXAMPLE_USAGE.md for detailed examples
  • ✅ Review API_DOCS.md for programmatic usage
  • ✅ Try the sample CSV: sample_companies.csv

🚀 You're Ready!

That's it! You now have a working AI-powered web crawler.

Pro tip: Start with 2-3 companies to test before running larger batches.


Need help?

  • Check the logs in your terminal
  • Review the troubleshooting section in README.md
  • Make sure your API key is valid and has sufficient quota