Skip to content

Latest commit

 

History

History
236 lines (187 loc) · 4.82 KB

File metadata and controls

236 lines (187 loc) · 4.82 KB

Docker Containerization Guide

Run the AI Web Crawler in a Docker container for easy deployment and portability.

Prerequisites

  • Docker installed (Get Docker)
  • Docker Compose installed (usually comes with Docker Desktop)
  • Your API key in .env file

Quick Start

1. Set Up Environment Variables

Create .env file with your API key:

echo "OPENAI_API_KEY=your_actual_key_here" > .env
echo "LLM_MODEL=gpt-4-turbo-preview" >> .env

2. Build and Run with Docker Compose

# Build and start the container
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the container
docker-compose down

The app will be available at: http://localhost:8501

Alternative: Docker Commands

Build the Image

docker build -t ai-web-crawler:latest .

Run the Container

docker run -d \
  --name ai-web-crawler \
  -p 8501:8501 \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/reports:/app/reports \
  -e OPENAI_API_KEY=your_key_here \
  ai-web-crawler:latest

View Logs

docker logs -f ai-web-crawler

Stop Container

docker stop ai-web-crawler
docker rm ai-web-crawler

Container Features

✅ Included in Container

  • Python 3.13
  • All Python dependencies
  • Playwright with Chromium browser
  • System libraries for browser automation
  • Streamlit web server

📁 Persistent Data

Volumes are mounted to persist:

  • ./outputs/ - Crawled data
  • ./reports/ - Generated reports

🔧 Environment Variables

Configure via .env file or pass directly:

Variable Description Default
OPENAI_API_KEY OpenAI API key Required
ANTHROPIC_API_KEY Anthropic API key Optional
LLM_MODEL Model to use gpt-4-turbo-preview
MAX_CRAWL_DEPTH Pages per site 3
DEFAULT_SEARCH_RESULTS Sites to crawl 5
CRAWL_TIMEOUT Timeout (seconds) 30

Useful Commands

Check Container Status

docker-compose ps

View Resource Usage

docker stats ai-web-crawler-bootcamp

Access Container Shell

docker exec -it ai-web-crawler-bootcamp /bin/bash

Restart Container

docker-compose restart

View Application Logs

docker-compose logs -f --tail=100

Rebuild After Code Changes

docker-compose up -d --build

Troubleshooting

Port Already in Use

Change port in docker-compose.yml:

ports:
  - "8502:8501"  # Use port 8502 instead

Out of Memory

Increase memory limit in docker-compose.yml:

deploy:
  resources:
    limits:
      memory: 4G

Browser Not Working

Ensure Playwright dependencies are installed:

docker exec -it ai-web-crawler-bootcamp playwright install-deps

Environment Variables Not Loading

Check .env file exists and is in the same directory as docker-compose.yml:

ls -la .env
cat .env

Production Deployment

Build Optimized Image

docker build --no-cache -t ai-web-crawler:v1.0 .

Push to Registry

# Tag for Docker Hub
docker tag ai-web-crawler:v1.0 yourusername/ai-web-crawler:v1.0

# Push
docker push yourusername/ai-web-crawler:v1.0

Run in Production

docker run -d \
  --name ai-web-crawler \
  --restart unless-stopped \
  -p 8501:8501 \
  -v /data/outputs:/app/outputs \
  -v /data/reports:/app/reports \
  --env-file .env \
  --memory="2g" \
  --cpus="2" \
  yourusername/ai-web-crawler:v1.0

Security Best Practices

  1. Never commit .env file - Already in .gitignore
  2. Use secrets management for production (AWS Secrets Manager, Azure Key Vault)
  3. Run as non-root user (add to Dockerfile if needed)
  4. Scan images for vulnerabilities:
    docker scan ai-web-crawler:latest
  5. Keep base image updated:
    docker pull python:3.13-slim
    docker-compose build --no-cache

Health Checks

The container includes automatic health checks:

  • Endpoint: http://localhost:8501/_stcore/health
  • Interval: 30 seconds
  • Timeout: 10 seconds
  • Retries: 3

Check health status:

docker inspect --format='{{.State.Health.Status}}' ai-web-crawler-bootcamp

Multi-Stage Build (Optional)

For smaller images, use multi-stage builds:

# Build stage
FROM python:3.13-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /app/wheels -r requirements.txt

# Runtime stage
FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache /wheels/*
# ... rest of Dockerfile

Need Help?