Hey there! π I'm Sole, a seasoned data scientist, published author, and machine learning instructor with a passion for pushing the boundaries of what's possible in the world of data science. β¨
In my journey, which kicked off in 2015, I've lent my expertise to finance and insurance companies. Here, I crafted robust machine learning models to tackle insurance claim assessments, credit risk evaluations, and fraud.
In 2017, I pioneered my first online course, Feature Engineering for Machine Learning, recognizing a gap in resources at the time. Since then, I've expanded my course offerings, delving into diverse aspects of machine learning. We now host more than 7 courses on advanced machine learning topics, taught by me and other extraordinary instructors, which you can find at our online school Train in Data.
Additionally, I've given life to the open-source Python library: Feature-engine. π
Currently, I'm pouring my energy into advancing Feature-engine and creating new, impactful courses on machine learning.
You'll often find me sharing insights about Feature-engine and the broader machine learning landscape through blogs, talks, and podcasts.
Excited to connect, collaborate, and learn together! π"
Check out the courses that we teach. Courses are up to date and work with the latest Python library releases!
| Courses | What you will learn |
|---|---|
| Feature engineering for machine learning | Learn to create new features, impute missing data, encode categorical variables, transform and discretize features and much more. |
| Feature selection for machine learning | Learn to select features using wrapper, filter, embedded and hybrid methods, and build simpler and reliable models. |
| Hyperparameter optimization for machine learning | Learn about grid and random search, Bayesian Optimization, Multi-fidelity models, Optuna, Hyperopt, Scikit-Optimize and more. |
| Machine learning with imbalanced data | Learn about under- and over-sampling, ensemble and cost-sensitive methods and improve the performance of models trained on imbalanced data. |
| Feature engineering for time series forecasting | Learn to create lag and window features, impute data in time series, encode categorical variabes and much more, specifically for forecasting. |
| Forecasting with Machine Learning | Learn to perform time series forecasting with machine learning models like linear regression, random forests and xgboost. |
| Machine Learning Interpretability | Learn interpret and explain white-box and black-box models both globally and locally, including methods LIME, SHAP, and more. |
| Clustereing and Dimensionality Reduction | Learn to extract information from unlabelled data through clustering and dimensionality reduction techniques. |
Discover plenty of feature engineering and feature selection techniques in my books, where I seamlessly integrate plenty of methods using the lastest and most widely used Python libraries.
| Books | Summary |
|---|---|
| Python feature engineering Cookbook, third edition | Over 70 code recipes to implement feature engineering in tabular, transactional, time series and text data. |
| Feature selection in machine learning, second edition | Over 20 methods to select the most predictive features and build simpler, faster, and more reliable machine learning models. |
I actively contribute to open-source libraries as part of my commitment to fostering collaborative innovation and enhancing accessibility in the realm of data science and machine learning. Here are some libraries I contributed to:
| Library | About | Role |
|---|---|---|
| Feature-engine | Multiple transformers for missind data imputation, categorical encoding, variable transformation and discretization, feature creation and more. | Maintainer. |
| tsfresh | Automatically create features for time series classification | Expanded documentation. |
| imbalanced-learn | Tools for under- and over-sampling and dealing with imbalanced data | Multiple PRs to improve documentation. |
| BorutaPy | Feature selection using Boruta | Maintainer. |
| Eli5 | Tools for machine learning interpretability | Multiple PRs to maintain library and improve documentation. |
Stay connected and follow me across these platforms to stay updated on the latest in data science and machine learning:
| Media | Summary |
|---|---|
| Train in Data | Enroll in our courses and books |
| YouTube | I post about data science, machine learning and how to become a data scientist. |
| Newsletter | I talk about data science, machine learning and how to become a data scientist. |
| I talk about data science, machine learning and how to become a data scientist. | |
| Blog | I write about data science, machine learning, feature engineering and selection and more. |
That's it! I hope to see you around.







