Skip to content

Commit 3aa866e

Browse files
Merge pull request #148 from photon149/main
K Medoid Based Clustering
2 parents b691291 + b81d788 commit 3aa866e

File tree

5 files changed

+709
-0
lines changed

5 files changed

+709
-0
lines changed
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

Machine Learning/Algorithms/K Medoids Clustering/Model/kmedoid_clustering.ipynb

+626
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
## **PROJECT TITLE**
2+
K-Means Clustering
3+
4+
## **INTRODUCTION**
5+
In this tutorial we will be using Jupyter Notebook to learn how to use K Medoids Clustering algorithm. In this tutorial we will be learning how K Medoids clustering works.
6+
7+
## **DESCRIPTION**
8+
Our aim here is to process the learning data, the K-Medoids algorithm in data mining is a Clustering Algorithm in Machine Learning that uses Medoids (i.e. Actual Objects in a Cluster) to represent the Cluster.We will understand deeper about the same in this code.K Medoid is a Clustering Algorithm in Machine Learning that uses Medoids (i.e. Actual Objects in a Cluster) to represent the Cluster.cluster of which the representative object is the most similar. The partitioning method is then performed based on the principle of minimizing the sum of the dissimilarities between each object p and its corresponding representative object.
9+
10+
## Partitioning Around Medoids (PAM)
11+
The Partitioning Around Medoids (PAM) algorithm is a popular realization of k-medoids clustering. It tackles the problem in an iterative, greedy way. Like the k-means algorithm, the initial representative objects (called seeds) are chosen arbitrarily. We consider whether replacing a representative object by a nonrepresentative object would improve the clustering quality. All the possible replacements are tried out. The iterative process of replacing representative objects by other objects continues until the quality of the resulting clustering cannot be improved by any replacement. This quality is measured by a cost function of the average dissimilarity between an object and the representative object of its cluster.<br>
12+
PAM starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering.<br>
13+
PAM works effectively for small data sets, but does not scale well for large data sets (due to the computational complexity)
14+
15+
16+
## K Medoid Clustering Process
17+
<ol>
18+
<li>Initialize: select k random points out of the n data points as the medoids.</li>
19+
<li>Associate each data point to the closest medoid by using any common distance metric methods.</li>
20+
<li>While the cost decreases:<br>
21+
For each medoid m, for each data o point which is not a medoid:
22+
<ol>
23+
<li>Swap m and o, associate each data point to the closest medoid, recompute the cost.</li>
24+
<li>If the total cost is more than that in the previous step, undo the swap.</li>
25+
</ol>
26+
</li>
27+
</ol>
28+
29+
30+
## **LIBRARIES USED**
31+
- Pandas
32+
- Numpy
33+
- matplotlib.pyplot
34+
- scikit-learn
35+
- scikit-learn-extra
36+
37+
38+
## Advantages
39+
<ul>
40+
<li>It outputs the final clusters of objects in a fixed number of iterations</li>
41+
<li>It is simple to understand and easy to implement.</li>
42+
<li>K-Medoid Algorithm is fast and converges in a fixed number of steps.</li>
43+
<li>PAM is less sensitive to outliers than other partitioning algorithms.</li>
44+
</ul>
45+
46+
## Disadvantages
47+
<ul>
48+
<li>The main disadvantage of K-Medoid algorithms is that it is not suitable for clustering non-spherical (arbitrary shaped) groups of objects. This is because it relies on minimizing the distances between the non-medoid objects and the medoid (the cluster center) – briefly, it uses compactness as clustering criteria instead of connectivity.</li>
49+
<li>It may obtain different results for different runs on the same dataset because the first k medoids are chosen randomly.</li>
50+
</ul>
51+
52+
## Complexity
53+
The time complexity of the algorithm is O(k * (n-k)2) , where k is th enumber of clusters.
54+
55+
## Comparison with K Means
56+
The k-medoids method is more robust than k-means in the presence of noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean. However, the complexity
57+
of each iteration in the k-medoids algorithm is O(k(n − k)<sup>2</sup>). For large values of n
58+
and k, such computation becomes very costly, and much more costly than the k-means
59+
method.Both methods require the user to specify k, the number of clusters.
60+
61+
## **IMAGES**
62+
<img src = "https://github.com/photon149/DS-ScriptsNook/blob/56c53773376f4f8d12231bfb50eb63ddb05c8f03/Machine%20Learning/Algorithms/K%20Medoids%20Clustering/Images/data_cluster.png">
63+
<img src = "https://github.com/photon149/DS-ScriptsNook/blob/56c53773376f4f8d12231bfb50eb63ddb05c8f03/Machine%20Learning/Algorithms/K%20Medoids%20Clustering/Images/kmedoids_cluster.png">
64+
65+
## **CONCLUSION**
66+
67+
1. In this , I have implemented unsupervised clustering technique called K-Medoids Clustering.
68+
2. In this , I have created a random dataset using make_blobs functions present in the scikit learn library
69+
3. Then , we implemented the KMedoids clustering model , using scikit learn extra library
70+
4. After implementation we got a silhouette score of around 63.7%
71+
5. Then we have compared the same with two other clustering technique used in the markets that is KMeans and Birch Alogrithm
72+
73+
Model Scores<br>
74+
KMedoids 0.637642<br>
75+
KMeans 0.594127<br>
76+
Birch 0.571567
77+
78+
79+
80+
81+
82+

0 commit comments

Comments
 (0)