Data Engineer & Data Scientist — Python · SQL · Airflow · AWS · Spark · scikit-learn
Purdue alum building data pipelines and ML systems that turn messy data into clear decisions. Currently open to full-time Data Engineer and Data Scientist roles.
🌐 Portfolio: portfolio-green-eta-59.vercel.app
- Languages: Python, SQL, JavaScript
- Data & ML: pandas, scikit-learn, NumPy, Jupyter, Streamlit
- Engineering: Airflow, Spark, AWS, Docker, REST APIs
- Viz & Reporting: matplotlib, seaborn, Quarto, Vite
🥉 March Madness Fan Predictor — 3rd place, CCAC 2024
RandomForest model predicting NCAA bracket selections by quantifying geographic fan bias. 67% accuracy on 76k+ brackets using Haversine distance features and KenPom analytics. scikit-learn pipeline + interactive Streamlit dashboard.
Python tooling that bulk-creates and standardizes ClickUp task hierarchies from JSON templates — turns multi-hour project setup into a one-command run. Tests included.
EDA on retail shopper data — pandas/seaborn pipeline producing customer-segment insights and pairplot visualizations.
Jupyter analysis project with Quarto-rendered HTML reports — reproducible workflow from raw data to publishable findings.
- 📫 Email: mlwhitfi24@gmail.com
- 💼 LinkedIn: michael-whitfield-jr

