Welcome to our open-source project aimed at fostering practical data engineering skills! This project is inspired by the approach of breaking into data engineering with zero cost and a focus on hands-on projects. We will guide you through setting up a real-world data pipeline using modern tools and technologies like Python, BigQuery/Snowflake, and Astronomer. Whether you're a beginner looking to dive into data engineering or an experienced professional aiming to brush up on your skills, this project i...
This project outlines a step-by-step approach to building a data engineering pipeline, from sourcing data to implementing quality checks. We focus on practical, project-based learning to equip you with the skills needed to excel in the field of data engineering.
- A Python script to fetch data from a REST API.
- A process to dump this data into a CSV file initially.
- A Snowflake or BigQuery setup to manage your data in the cloud.
- An automated pipeline using Astronomer to ingest data on a scheduled basis.
- Data quality checks to ensure the integrity of your data.
Before you begin, make sure you have the following prerequisites:
- Python installed on your machine.
- An account with Snowflake or BigQuery (free tiers are available).
- An account with Astronomer.
- Find a Data Source: Choose a data source you are interested in (e.g., stock market, Pokémon, sports data). Make sure it offers a REST API.
- Python Script for Data Fetching: Clone this repository and navigate to the script directory. Modify the script to point to your chosen data source.
- Snowflake/BigQuery Account: Follow the instructions on their website to set up a free trial account. Modify the script to dump data into your Snowflake/BigQuery instance instead of a CSV.
- Astronomer for Automation: Set up an account and follow the instructions to automate your data ingestion.
- Data Quality Checks: Implement data quality checks using Great Expectations or your custom checks.
We welcome contributions from the community! Whether it's adding new features, improving documentation, or reporting bugs, your contributions are greatly appreciated.
- Fork the Repository: Start by forking this repository to your GitHub account.
- Create a Pull Request: After making your changes, create a pull request against our repository. Please provide a clear description of your changes.
- Code Review: Your pull request will be reviewed by our team. We may suggest some changes or improvements.
This project is open-source and available under the MIT License.
- This project was inspired by the concept of learning data engineering through practical, project-based tasks. Special thanks to the creators of Breaking into data engineering can be 100% free and 100% project-based!
Test git push and pull