Run the AI Web Crawler in a Docker container for easy deployment and portability.
- Docker installed (Get Docker)
- Docker Compose installed (usually comes with Docker Desktop)
- Your API key in
.envfile
Create .env file with your API key:
echo "OPENAI_API_KEY=your_actual_key_here" > .env
echo "LLM_MODEL=gpt-4-turbo-preview" >> .env# Build and start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the container
docker-compose downThe app will be available at: http://localhost:8501
docker build -t ai-web-crawler:latest .docker run -d \
--name ai-web-crawler \
-p 8501:8501 \
-v $(pwd)/outputs:/app/outputs \
-v $(pwd)/reports:/app/reports \
-e OPENAI_API_KEY=your_key_here \
ai-web-crawler:latestdocker logs -f ai-web-crawlerdocker stop ai-web-crawler
docker rm ai-web-crawler- Python 3.13
- All Python dependencies
- Playwright with Chromium browser
- System libraries for browser automation
- Streamlit web server
Volumes are mounted to persist:
./outputs/- Crawled data./reports/- Generated reports
Configure via .env file or pass directly:
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | Required |
ANTHROPIC_API_KEY |
Anthropic API key | Optional |
LLM_MODEL |
Model to use | gpt-4-turbo-preview |
MAX_CRAWL_DEPTH |
Pages per site | 3 |
DEFAULT_SEARCH_RESULTS |
Sites to crawl | 5 |
CRAWL_TIMEOUT |
Timeout (seconds) | 30 |
docker-compose psdocker stats ai-web-crawler-bootcampdocker exec -it ai-web-crawler-bootcamp /bin/bashdocker-compose restartdocker-compose logs -f --tail=100docker-compose up -d --buildChange port in docker-compose.yml:
ports:
- "8502:8501" # Use port 8502 insteadIncrease memory limit in docker-compose.yml:
deploy:
resources:
limits:
memory: 4GEnsure Playwright dependencies are installed:
docker exec -it ai-web-crawler-bootcamp playwright install-depsCheck .env file exists and is in the same directory as docker-compose.yml:
ls -la .env
cat .envdocker build --no-cache -t ai-web-crawler:v1.0 .# Tag for Docker Hub
docker tag ai-web-crawler:v1.0 yourusername/ai-web-crawler:v1.0
# Push
docker push yourusername/ai-web-crawler:v1.0docker run -d \
--name ai-web-crawler \
--restart unless-stopped \
-p 8501:8501 \
-v /data/outputs:/app/outputs \
-v /data/reports:/app/reports \
--env-file .env \
--memory="2g" \
--cpus="2" \
yourusername/ai-web-crawler:v1.0- Never commit
.envfile - Already in.gitignore - Use secrets management for production (AWS Secrets Manager, Azure Key Vault)
- Run as non-root user (add to Dockerfile if needed)
- Scan images for vulnerabilities:
docker scan ai-web-crawler:latest
- Keep base image updated:
docker pull python:3.13-slim docker-compose build --no-cache
The container includes automatic health checks:
- Endpoint:
http://localhost:8501/_stcore/health - Interval: 30 seconds
- Timeout: 10 seconds
- Retries: 3
Check health status:
docker inspect --format='{{.State.Health.Status}}' ai-web-crawler-bootcampFor smaller images, use multi-stage builds:
# Build stage
FROM python:3.13-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /app/wheels -r requirements.txt
# Runtime stage
FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache /wheels/*
# ... rest of Dockerfile- Check logs:
docker-compose logs -f - Inspect container:
docker inspect ai-web-crawler-bootcamp - Test manually: Access http://localhost:8501 in browser
- Report issues: Check TROUBLESHOOTING.md