Skip to content

Commit 84ec941

Browse files
Update k_means_clust.py (TheAlgorithms#8996)
* Update k_means_clust.py * Apply suggestions from code review --------- Co-authored-by: Tianyi Zheng <[email protected]>
1 parent b2e186f commit 84ec941

File tree

1 file changed

+10
-13
lines changed

1 file changed

+10
-13
lines changed

machine_learning/k_means_clust.py

+10-13
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@
1111
- initial_centroids , initial centroid values generated by utility function(mentioned
1212
in usage).
1313
- maxiter , maximum number of iterations to process.
14-
- heterogeneity , empty list that will be filled with hetrogeneity values if passed
14+
- heterogeneity , empty list that will be filled with heterogeneity values if passed
1515
to kmeans func.
1616
Usage:
17-
1. define 'k' value, 'X' features array and 'hetrogeneity' empty list
17+
1. define 'k' value, 'X' features array and 'heterogeneity' empty list
1818
2. create initial_centroids,
1919
initial_centroids = get_initial_centroids(
2020
X,
@@ -31,8 +31,8 @@
3131
record_heterogeneity=heterogeneity,
3232
verbose=True # whether to print logs in console or not.(default=False)
3333
)
34-
4. Plot the loss function, hetrogeneity values for every iteration saved in
35-
hetrogeneity list.
34+
4. Plot the loss function and heterogeneity values for every iteration saved in
35+
heterogeneity list.
3636
plot_heterogeneity(
3737
heterogeneity,
3838
k
@@ -198,13 +198,10 @@ def report_generator(
198198
df: pd.DataFrame, clustering_variables: np.ndarray, fill_missing_report=None
199199
) -> pd.DataFrame:
200200
"""
201-
Function generates easy-erading clustering report. It takes 2 arguments as an input:
202-
DataFrame - dataframe with predicted cluester column;
203-
FillMissingReport - dictionary of rules how we are going to fill missing
204-
values of for final report generate (not included in modeling);
205-
in order to run the function following libraries must be imported:
206-
import pandas as pd
207-
import numpy as np
201+
Generates a clustering report. This function takes 2 arguments as input:
202+
df - dataframe with predicted cluster column
203+
fill_missing_report - dictionary of rules on how we are going to fill in missing
204+
values for final generated report (not included in modelling);
208205
>>> data = pd.DataFrame()
209206
>>> data['numbers'] = [1, 2, 3]
210207
>>> data['col1'] = [0.5, 2.5, 4.5]
@@ -306,10 +303,10 @@ def report_generator(
306303
a.columns = report.columns # rename columns to match report
307304
report = report.drop(
308305
report[report.Type == "count"].index
309-
) # drop count values except cluster size
306+
) # drop count values except for cluster size
310307
report = pd.concat(
311308
[report, a, clustersize, clusterproportion], axis=0
312-
) # concat report with clustert size and nan values
309+
) # concat report with cluster size and nan values
313310
report["Mark"] = report["Features"].isin(clustering_variables)
314311
cols = report.columns.tolist()
315312
cols = cols[0:2] + cols[-1:] + cols[2:-1]

0 commit comments

Comments
 (0)