|
| 1 | +# **LOGISTIC REGRESSION** |
| 2 | + |
| 3 | +Logistic Regression is a SUPERVISED ML algorithm which uses labelled data. |
| 4 | + |
| 5 | +It is highly used for classification tasks & projects. |
| 6 | + |
| 7 | +**GOAL** |
| 8 | + |
| 9 | +The goal is to use and understand this algorithm using a project as water poratabilty prediction. |
| 10 | + |
| 11 | + |
| 12 | +**PURPOSE** |
| 13 | + |
| 14 | + To give a clear undertanding of what logistic regression is all about, its use cases, advantages, disadvantages, working etc. |
| 15 | + |
| 16 | + |
| 17 | +**DATASET** |
| 18 | + |
| 19 | +for the dataset being used [click here](https://www.kaggle.com/adityakadiwal/water-potability) |
| 20 | + |
| 21 | + |
| 22 | +**DESCRIPTION** |
| 23 | + |
| 24 | +Let's understand this algorithm with help of a project i.e. water portability prediction where we have to classify the target variable 'Potability' which |
| 25 | +Indicates if water is safe for human consumption or not as 0 or 1. |
| 26 | + |
| 27 | +Access to safe drinking-water is essential to health, a basic human right and a component of effective policy for health protection. This is important as a health and development issue at a national, regional and local level. In some regions, it has been shown that investments in water supply and sanitation can yield a net economic benefit, since the reductions in adverse health effects and health care costs outweigh the costs of undertaking the interventions. |
| 28 | + |
| 29 | + |
| 30 | +**WHAT I HAD DONE** |
| 31 | + |
| 32 | +Step 1: Data Preprocessing & Exploration. |
| 33 | + |
| 34 | +Step 2: Data Visualization. |
| 35 | + |
| 36 | +Step 3: Data Training & Model Creation. |
| 37 | + |
| 38 | +Step 4: Performance Evaluation. |
| 39 | + |
| 40 | + |
| 41 | +**WORKFLOW** |
| 42 | + |
| 43 | +1.Data preprocessing and exploration to understand what kind of data will we working on. |
| 44 | + |
| 45 | +Here we work on preprocessing and exploring the data to understand what kind of data we are working on, it's shape, memory usage, columns, data types etc. |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +Also, check for any null or missing values, if found then remove or replace them with central tendancy. |
| 50 | + |
| 51 | + |
| 52 | +2. Data visualization to draw insights and get better underdstanding on different columns present in the dataset. |
| 53 | + |
| 54 | + |
| 55 | + |
| 56 | +3. Data training using train-test-split method from sklearn to split the data into training and testing data and then Model creation using logistic regression algorithm, where we import the model, then initialize it and fit training data into it and lastly perform predictions on the test data. |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +4. Checking performance by error and accuracy check to find how efficient algorithm performed for this project. |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | +**STATE YOUR PROCEDURE** |
| 66 | + |
| 67 | +The step by step procedure that I followed is given above along with respective screenshots. |
| 68 | + |
| 69 | +Since, our target variable 'Portability' has binary values 0 or 1, I have used logistic regression classification algorithm here to classify water portability as safe for drinking or not. |
| 70 | + |
| 71 | +Other classification algorithms can also be used on this dataset to draw comparison between them, and find which algorithm works best. |
| 72 | + |
| 73 | + |
| 74 | +**USAGE** |
| 75 | +- Used for classification tasks & projects. |
| 76 | +- Various projects in fields of medical, education, etc. |
| 77 | + |
| 78 | + |
| 79 | +**USE CASES** |
| 80 | + |
| 81 | +- E-mail classification as spam or not. |
| 82 | +- Insurance claim prediction to predict how likely the policy holder will claim insurance or not. |
| 83 | +- MBA course specialization. |
| 84 | +- And various more classification projects. |
| 85 | + |
| 86 | + |
| 87 | +**LIBRARIES USED** |
| 88 | + |
| 89 | +- pandas |
| 90 | +- matplotlib |
| 91 | +- seaborn |
| 92 | +- numpy |
| 93 | +- sklearn |
| 94 | + |
| 95 | + |
| 96 | +**ADVANTAGES** |
| 97 | + |
| 98 | +- Easy understand & interpret |
| 99 | +- Highly used for classification tasks. |
| 100 | +- Used where target variable is discrete. |
| 101 | +- To draw fruitful insights that might enhance future business plans. |
| 102 | + |
| 103 | + |
| 104 | +**DISADVANTAGES** |
| 105 | + - Cannot be used for regression tasks. |
| 106 | + - Can give low accuracy if features are not identified properly. |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | +**APPLICATIONS** |
| 111 | + |
| 112 | +- Used in real life in many areas, such as engineering, civil planning, law, and business. |
| 113 | +- Can help in improving business plan by providing fruitful insights and prediction analysis. |
| 114 | + |
| 115 | + |
| 116 | + |
| 117 | +**CONCLUSION** |
| 118 | + |
| 119 | +- Logistic Regression is one of the most efficient classification algorithm which is highly used for classification tasks. |
| 120 | +- Used in real life in many areas, such as engineering, civil planning, law, and business. |
| 121 | +- Can help in improving business plan by providing fruitful insights and prediction analysis. |
| 122 | + |
| 123 | +**REFERENCES** |
| 124 | +- https://machinelearningmastery.com/logistic-regression-for-machine-learning/ |
| 125 | +- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html |
| 126 | +- https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc |
| 127 | + |
| 128 | +**Author** |
| 129 | + |
| 130 | +[Ayushi Shrivastava](https://github.com/ayushi424) |
| 131 | + |
| 132 | + |
| 133 | + |
0 commit comments