Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logistic Regression #114 #115

Merged
merged 11 commits into from
Jul 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
133 changes: 133 additions & 0 deletions Machine Learning/Algorithms/Logistic Regression/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# **LOGISTIC REGRESSION**

Logistic Regression is a SUPERVISED ML algorithm which uses labelled data.

It is highly used for classification tasks & projects.

**GOAL**

The goal is to use and understand this algorithm using a project as water poratabilty prediction.


**PURPOSE**

To give a clear undertanding of what logistic regression is all about, its use cases, advantages, disadvantages, working etc.


**DATASET**

for the dataset being used [click here](https://www.kaggle.com/adityakadiwal/water-potability)


**DESCRIPTION**

Let's understand this algorithm with help of a project i.e. water portability prediction where we have to classify the target variable 'Potability' which
Indicates if water is safe for human consumption or not as 0 or 1.

Access to safe drinking-water is essential to health, a basic human right and a component of effective policy for health protection. This is important as a health and development issue at a national, regional and local level. In some regions, it has been shown that investments in water supply and sanitation can yield a net economic benefit, since the reductions in adverse health effects and health care costs outweigh the costs of undertaking the interventions.


**WHAT I HAD DONE**

Step 1: Data Preprocessing & Exploration.

Step 2: Data Visualization.

Step 3: Data Training & Model Creation.

Step 4: Performance Evaluation.


**WORKFLOW**

1.Data preprocessing and exploration to understand what kind of data will we working on.

Here we work on preprocessing and exploring the data to understand what kind of data we are working on, it's shape, memory usage, columns, data types etc.
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr1.jpg)
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr2.jpg)

Also, check for any null or missing values, if found then remove or replace them with central tendancy.
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr3.jpg)

2. Data visualization to draw insights and get better underdstanding on different columns present in the dataset.
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr4.jpg)
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr5.jpg)

3. Data training using train-test-split method from sklearn to split the data into training and testing data and then Model creation using logistic regression algorithm, where we import the model, then initialize it and fit training data into it and lastly perform predictions on the test data.

![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr6.jpg)
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr7.jpg)

4. Checking performance by error and accuracy check to find how efficient algorithm performed for this project.
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Logistic%20Regression/Images/lr8.jpg)


**STATE YOUR PROCEDURE**

The step by step procedure that I followed is given above along with respective screenshots.

Since, our target variable 'Portability' has binary values 0 or 1, I have used logistic regression classification algorithm here to classify water portability as safe for drinking or not.

Other classification algorithms can also be used on this dataset to draw comparison between them, and find which algorithm works best.


**USAGE**
- Used for classification tasks & projects.
- Various projects in fields of medical, education, etc.


**USE CASES**

- E-mail classification as spam or not.
- Insurance claim prediction to predict how likely the policy holder will claim insurance or not.
- MBA course specialization.
- And various more classification projects.


**LIBRARIES USED**

- pandas
- matplotlib
- seaborn
- numpy
- sklearn


**ADVANTAGES**

- Easy understand & interpret
- Highly used for classification tasks.
- Used where target variable is discrete.
- To draw fruitful insights that might enhance future business plans.


**DISADVANTAGES**
- Cannot be used for regression tasks.
- Can give low accuracy if features are not identified properly.



**APPLICATIONS**

- Used in real life in many areas, such as engineering, civil planning, law, and business.
- Can help in improving business plan by providing fruitful insights and prediction analysis.



**CONCLUSION**

- Logistic Regression is one of the most efficient classification algorithm which is highly used for classification tasks.
- Used in real life in many areas, such as engineering, civil planning, law, and business.
- Can help in improving business plan by providing fruitful insights and prediction analysis.

**REFERENCES**
- https://machinelearningmastery.com/logistic-regression-for-machine-learning/
- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
- https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc

**Author**

[Ayushi Shrivastava](https://github.com/ayushi424)



Loading