Get up and running in 5 minutes!
cd ai-web-crawler-bootcamp
./setup.shThis will:
- ✅ Check Python version
- ✅ Create virtual environment
- ✅ Install all dependencies
- ✅ Install Playwright browsers
- ✅ Create configuration files
Edit the .env file:
nano .envAdd your API key (choose one):
For OpenAI:
OPENAI_API_KEY=sk-your-actual-key-here
LLM_MODEL=gpt-4-turbo-previewFor Azure OpenAI:
AZURE_API_KEY=your-key-here
AZURE_API_BASE=https://your-resource.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview
LLM_MODEL=azure/gpt-4For Anthropic Claude:
ANTHROPIC_API_KEY=your-key-here
LLM_MODEL=claude-3-opus-20240229Save and exit (Ctrl+X, then Y, then Enter)
# Make sure virtual environment is activated
source venv/bin/activate
# Start the web interface
streamlit run app.pyYour browser should open automatically to http://localhost:8501
If not, open it manually.
In the web interface:
- Go to the Search-Based tab
- Enter:
Software development consultancy finland - Number of results:
5 - Click Start Search-Based Crawl
- Wait 5-10 minutes (AI is working!)
- View results and download reports
Or run from command line:
python orchestrator.pyThis runs the example automatically.
Test that everything works:
python -c "import litellm, playwright, streamlit; print('✅ All packages installed')"- Web Interface: Clean Streamlit UI with two tabs
- Search Tab: Input field for search term and number selector
- CSV Tab: File upload area with sample download
- Sidebar: Configuration status and about info
After your first crawl:
- Results Table: Shows all companies with their values
- Download Buttons: Get Excel or CSV
- Company Details: Expand to see individual analysis
- Files Created: Check
./reports/directory
Virtual environment not activated?
source venv/bin/activatePort 8501 already in use?
streamlit run app.py --server.port 8502Import errors?
pip install -r requirements.txtPlaywright browser missing?
playwright install chromium- ✅ Read
README.mdfor full documentation - ✅ Check
EXAMPLE_USAGE.mdfor detailed examples - ✅ Review
API_DOCS.mdfor programmatic usage - ✅ Try the sample CSV:
sample_companies.csv
That's it! You now have a working AI-powered web crawler.
Pro tip: Start with 2-3 companies to test before running larger batches.
Need help?
- Check the logs in your terminal
- Review the troubleshooting section in README.md
- Make sure your API key is valid and has sufficient quota