ML model for 2019 Yellow Cab payment type and tip amount using PySpark
- pullData.py - Python script to retrieve trip record for 2019, look up table, and data dictionary
- explorationSummary.ipynb - Notebook of exploration of data, includes summary, graphs, and charts
- tipAmount.ipynb - Notebook contains importing, cleaning, and preprocessing data. Also, contains models (LR and Gradient-boosted) for predicting tip amount
- paidClassifier.ipynb - Notebook contains logistic models for predicting if trip was paid for
- Data can be found and downloaded from
- Data definition
- Taxi Zone Lookup Table