This project investigates global wildfire trends and their impact on emissions from 2002 to 2023 using datasets from the Global Wildfire Information System (GWIS). It aims to analyze long-term trends, seasonal variations, regional prevalence, and the correlation between burned areas and emission gases to enhance understanding of wildfire patterns and their environmental implications.
-
Global Monthly Burned Area Dataset [2002-2023]: The dataset includes monthly burned area data (in hectares) from 2002 to 2023 for all countries and regions. It categorizes the data by types of land, such as forests, savannas, shrublands/grasslands, croplands, and others. This helps in analyzing fire patterns and their effects on different regions and types of land.
-
Global Monthly Emissions Dataset [2002-2023]: The dataset gives monthly data on how much pollution is released from burning plants (biomass) from 2002 to 2023. It shows how much CO2, CO, particles like dust and smoke (TPM and PM2.5), carbon, hydrocarbons, organic carbon, methane, sulfur dioxide, black carbon, and nitrogen oxides are released by different countries and regions. This helps to understand the environmental impact of burning plant materials.
- Data Analysis: Python (Pandas,Numpy)
- Visualization: Matplotlib, Seaborn
- Version Control: Git, GitHub
Project Data Report: Document detailing data cleaning and pipeline procedures.
Project Analysis Report: Final report containing data analysis and visualizations.
Project EDA: Notebook showcasing exploratory data analysis (EDA) for the project.
Instructions for setting up the project environment and running the analysis scripts.
# Clone the repository
git clone https://github.com/puni-ram48/MADE-SS2024.git
# Install dependencies
pip install -r requirements.txt
Data Pipeline here
Our project includes an automated data pipeline designed for wildfire analysis:
- Data Fetching: Automatically retrieves monthly wildfire burned area and emission datasets from specified sources.
- Data Transformation and Cleaning: Applies necessary transformations and cleans the data to ensure accuracy and consistency.
- Data Loading: Transformed data is loaded into structured formats suitable for analysis, ensuring integrity for further investigation
This pipeline ensures that our wildfire data is prepared and maintained for reliable analysis of trends and impacts.
Test Script here
We have developed a rigorous test script to validate our wildfire data pipeline:
- Tests include verification of data fetching accuracy.
- Ensures proper data cleaning and transformation procedures are followed.
- Validates data integrity and consistency throughout the pipeline.
Automated Workflow here
To maintain the reliability of our wildfire data pipeline, we have set up an automated workflow using GitHub Actions:
- Continuous Integration Tests: Automatically runs our test script upon every push to the main branch.Ensures any updates or modifications do not compromise the functionality and accuracy of the data pipeline.
This automated workflow guarantees a robust and error-free approach to analyzing wildfire trends and impacts, ensuring high-quality project outcomes.
Provide detailed instructions on how to execute the data pipeline and run the test scripts. Include any necessary commands or steps to set up the environment.
# command to run the data pipeline
python3 automated_datapipeline.py
# command to execute the test script
python3 automated_testing.py
We welcome contributions to this project! If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Make your changes and commit them (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Create a new Pull Request.
Please ensure your code is well-documented.
This project was initiated and completed by Puneetha Dharmapura Shrirama.
I would like to extend my gratitude to our tutors Philip Heltweg and Georg Schwarz for their guidance and support throughout this project.
This project is licensed under the MIT License - see the LICENSE file for details.