Data mining with Python!
- pandas
- numpy
- sklearn
- scipy
- matplotlib
Objective: Preprocess data by removing noise, null values, outliers and duplicates. With the clean data, graph it and split into test and training datasets
Learned: Must account of various user input (IE: United States can be: US, USA, united states, US of A, etc...), and you cannot remove all null values, as the null valued attribute may not be important, but the rest of the attributes are
Objective: Preprocess data, and use either Simple and Multiple Linear Regression. Normalize the data and classify the data set
Learned: It is important to know your data, to have a meaningful output
Objective: Implement Classification Models (Decision Trees, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, Logistic Regression, Artificial Neural Networks)
Learned: Feature Importance Analysis is very important, in order to apply these classification models
Objective: Implement Clustering, Text mining and Artificial Neural Networks
Learned: Clustering is very sensitive to it's input, and dramatically changes the expected output