In this project, we aims to design a web-based disaster response classification system which labels specific message with different categories. The projects can be divided into the following steps:
-
Establishing an ETL pipeline to gather the raw data, wrangle the data and store the data in a database.
-
Creating a machine learning pipeline to correctly classify the disaster response
-
Developing a web app to classify disaster response messages in real time.
n order to use the project, you need the following python packages:
- numpy, pandas
- sklearn, nltk
- SQLAlchemy
- flask, plotly
Clone this repository: 'git clone https://github.com/Yuzhe17/Disaster-responses-system-development.git'
To execute the ETL pipeline, run 'process_data.py 'data/disaster_messages.csv' 'data/disaster_categories.csv' 'disaster_response.db''
To execute the machine learning pipeline, run 'train_classifier.py 'disaster_response.db' 'model/classifier.pkl''
To deploy the web app, go the DisasterResponse_app and run the run.py, and go to http://localhost:3001/index
-
DisasterResponse_app/templates/go.html: a html file which displays the classification result
-
DisasterResponse_app/templates/master.html: a html file which displays the visualizations of the data
-
DisasterResponse_app/custom_transformers.py: a python scripts containing custom sklearn transformers
-
DisasterResponse/run.py: a python script containing web app initiation and routes
-
data/: a folder containing two csv files of meassages and categories of all disaster responses
-
model/classifier.pkl: pickle file of a sklearn classifier
-
process_data.py: a python script implementing an ETL pipeline
-
train_classifier.py: a python script implementing a machine learning pipeline
-
disaster_response.db: a sqlite database storing all the disaster response messages and categories
Thanks to figure eight for providing the dataset