Skip to content

Commit 0f25c24

Browse files
Merge pull request #123 from ayushi424/main
Decision Tree Regressor #121
2 parents a2b4e43 + 883f47b commit 0f25c24

File tree

10 files changed

+2326
-0
lines changed

10 files changed

+2326
-0
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
## **DECISION TREE REGRESSOR**
2+
3+
**INTRODUCTION**
4+
5+
Decision Tree is one of the efficient and highly used alorithm in machine learning.
6+
7+
We can use decision tree algorithm in two types as:
8+
9+
Decision Tree Classifier (for classification tasks)
10+
11+
Decision Tree Regressor (for regression tasks)
12+
13+
**PURPOSE**
14+
15+
The main prupose is to give a clear understanding on what basically Decision Tree Fegressor algorithm is all about, its working, usage, libaries used, advantages, disadvantages etc.
16+
17+
**DATASET**
18+
19+
For the dataset being used in this project [click here](https://www.kaggle.com/benroshan/factors-affecting-campus-placement)
20+
21+
22+
23+
**BRIEF EXPLANATION**
24+
- Let's understand Decision Tree regressor algorithm with help of the poject 'Campus Placement Analysis & Prediction'
25+
26+
Campus placement or campus recruiting is a program conducted within universities or other educational institutions to provide jobs to students nearing completion of their studies. In this type of program, the educational institutions partner with corporations who wish to recruit from the student population.
27+
28+
29+
Here, in this project we will analyze various features that affect campus placements and then perform prediction using decision tree regressor algorithm.
30+
31+
32+
33+
**WORKING CONDITIONS**
34+
1. Data preprocessing and exploration to understand what kind of data will we working on.
35+
36+
- Here we work on preprocessing and exploring the data to understand what kind of data we are working on, it's shape, memory usage, columns, data types etc.
37+
38+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr1.jpg)
39+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr2.jpg)
40+
41+
- Also, we need to check for any null or missing values, if found then replace or remove them accordingly.
42+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr3.jpg)
43+
44+
2. Data visualization to draw insights and get better underdstanding on different columns present in the dataset.
45+
46+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr4.jpg)
47+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr5.jpg)
48+
49+
Insights drawn through data visualization & analysis:
50+
* 64.65% candidate are males and rest 35.35 % are females.
51+
* Lowest SSC percentage among all candidates is 40.89% and highest is 89.4%.
52+
* 53.95 % of candidates gave their SSC exams under Central Board and rest 46.05% were from other board.
53+
* Lowest HSC percentage among all candidates is 37.0% and highest is 97.7%.
54+
* Only 39.07 % of candidates gave their HSC exams under Central Board and rest 60.93% were from other board.
55+
* Among all the candidates 52.56% candidates are from commmerce stream, 42.33% candidates are from Science stream and rest 5.12% candidates are from Arts Stream.
56+
* Lowest Degree percentage among all candidates is 50.0% and highest is 91.0%.
57+
* Degree title of 67.44% candidates is Commerce & Management, for 27.44% candidates is Science & Technology and rest 5.12% have other degree title.
58+
* 65.58 % candidates have no work experience and rest 34.42% have valid work experience.
59+
* Lowest Employability test percentage ( conducted by college) percentage among all candidates is 50.0% and highest is 98.0%.
60+
* Among all the candidates 55.81% candidates have Marketing & Finance Post Graduation(MBA)- Specialization and rest 44.19 % have Marketing & HR Post Graduation(MBA)- Specialization.
61+
* Among all the candidates 68.84% candidates have been successfully placed and rest 31.16 % have not been placed.
62+
63+
64+
65+
66+
3. Data training using train-test-split method from sklearn to split the data into training and testing data and then Model creation using decision tree regressor algorithm, where we import the model, then initialize it and fit training data into it and lastly perform predictions on the test data.
67+
68+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr6.jpg)
69+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr7.jpg)
70+
71+
72+
73+
5. Checking performance by error and accuracy check to find how efficient algorithm performed for this project.
74+
75+
![](https://github.com/ayushi424/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/Decision%20Tree%20Regressor/Images/dtr8.jpg)
76+
77+
78+
79+
80+
**USAGE**
81+
- It is basically used for REGRESSION TASKS.
82+
- Various sectors as engineering, healthcare, education etc.
83+
84+
**USE CASES**
85+
- Regression tasks.
86+
- Mainly used for complex projects.
87+
- Used in projects where there are more number of features.
88+
89+
**LIBRARIES USED**
90+
- pandas
91+
- matplotlib
92+
- seaborn
93+
- sklearn
94+
- numpy
95+
96+
**ADVANTAGES**
97+
- Requires little data preparation. Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed. Note however that this module does not support missing values.
98+
- Easy understand & interpret
99+
- Good for Regression tasks
100+
101+
102+
**DISADVANTAGES**
103+
- Decision-tree learners can create over-complex trees that do not generalise the data well. This is called overfitting. Mechanisms such as pruning, setting the minimum number of samples required at a leaf node or setting the maximum depth of the tree are necessary to avoid this problem.
104+
- Relationship between features should be well understood or else it might give low performance efficiency.
105+
- Can give low accuracy if features are not identified properly.
106+
- Cannot be used for classification projects.
107+
108+
**APPLICATIONS**
109+
110+
- Used in different various regression tasks.
111+
- Decision trees are used for handling non-linear data sets effectively.
112+
- The decision tree tool is used in real life in many areas such as engineering, healthcare, civil planning, law, and business.
113+
- Can help in improving business plan by providing fruitful insights and prediction analysis.
114+
115+
**CONCLUSION**
116+
117+
* Decision Tree Regressor is one of the most efficient classification algorithm which is highly used for classification tasks.
118+
119+
* For this project i.e. 'Campus Placement Analysis & Prediction' , accuracy of decision tree regressor is 1.00 i.e. 100%
120+
121+
* It Can help in improving business plan by providing fruitful insights and prediction analysis.
122+
123+
124+
**Some beginner friendly REFERENCES**
125+
126+
- https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
127+
- https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html
128+
- https://www.geeksforgeeks.org/python-decision-tree-regression-using-sklearn/
129+
- https://towardsdatascience.com/machine-learning-basics-decision-tree-regression-1d73ea003fda
130+
131+
132+
**Author**
133+
134+
[Ayushi Shrivastava](https://github.com/ayushi424)
135+
136+
**Happy Learning :)**

0 commit comments

Comments
 (0)