ActivateScraper is an asynchronous web scraper that fetches and processes event data from ActivateUTS clubs by hitting ActivateUTS hidden backend API to fetch real time data.
- Asynchronous HTTP requests using
httpx
- Rate limiting and request throttling
- Data validation using Pydantic models
- JSON output with proper formatting
- Error handling and logging
- Python 3.7+
- httpx
- asyncio
- pydantic
- Clone the repository:
git clone https://github.com/tihhh/activatescraper.git
cd activatescraper
- Install dependencies:
pip install -r requirements.txt
Run the scraper:
python main.py
The scraper will:
- Load club endpoints from
files/club_paths.json
(data taken from ActivateUTS sitemap) - Fetch event data for each club asynchronously
- Save the results to
data/club_data.json
The scraper includes:
- Rate limit detection and retry mechanism
- Error logging for failed requests