Skip to content

Data Pipeline. Using Beautiful Soup and the Prefect Orchestrator to scrape data from HN as part of a daily scheduled job

Notifications You must be signed in to change notification settings

serena97/scraping_hn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping HN with BeautifulSoup and Prefect

This project creates a data pipeline, that scrapes Hacker News every day at 6pm and loads the transformed data into PostgreSQL.

Setting up the environment: python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt prefect cloud login

How to run the scheduled scraping:

  1. docker-compose up -d -> run DB, login to pgadmin to see result
  2. python main.py -> will run the scraper every day at 6pm
  3. prefect deployment run 'scraping/my-hn-deployment' -> manually trigger run

About

Data Pipeline. Using Beautiful Soup and the Prefect Orchestrator to scrape data from HN as part of a daily scheduled job

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages