You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When dealing with lots of data points, clustering algorithms may be needed in
5
-
order to group them. The k-means algorithm partitions _n_ data points into
6
-
_k_ clusters and finds the centroids of these clusters incrementally.
3
+
When dealing with lots of data points, clustering algorithms may be used to group them. The k-means algorithm partitions _n_ data points into _k_ clusters and finds the centroids of these clusters incrementally.
7
4
8
-
The basic k-means algorithm is initialized with _k_ centroids at random
9
-
positions.
5
+
The algorithm assigns data points to the closest cluster, and the centroids of each cluster are re-calculated. These steps are repeated until the centroids do not changing anymore.
10
6
11
-
It assigns data points to the closest cluster, the centroids of each
12
-
cluster are re-calculated afterwards. These assignment/recalculating steps are
13
-
repeated until the centroids are not changing anymore.
7
+
The basic k-means algorithm is initialized with _k_ centroids at random positions. This implementation addresses some disadvantages of the arbitrary initialization method with the k-means++ algorithm (see "Further reading" at the end).
14
8
15
-
This implementation addresses some disadvantages of the arbitrary
16
-
initialization method with the k-means++ algorithm (see "Further reading" at the
17
-
end).
9
+
## Installation
18
10
19
11
## Installing via npm
20
12
@@ -23,83 +15,53 @@ Install kmpp as Node.js module via NPM:
23
15
$ npm install kmpp
24
16
````
25
17
26
-
## Setting up a new instance
27
-
28
-
````js
29
-
// var kmpp = require('kmpp'); /* When running in Node.js */
30
-
var k =newkmpp();
31
-
````
32
-
33
-
## Attributes
34
-
35
-
### kmpp [Boolean]
36
-
37
-
Enables or disables k-means++ initialization. If disabled, the initial
38
-
centroids are selected randomly. It is recommended to leave this setting
39
-
enabled, as it reduces the amount of the actual algorithm's iteration steps.
40
-
41
-
### k [number]
42
-
43
-
This value defines the amount of clusters. You can let the module guess the
44
-
amount by the rule of thumb with the guessK() function. It is crucial to select
45
-
an appropriate value of clusters in order to find a good solution.
46
-
47
-
### maxIterations [number]
48
-
49
-
Defines the maximum amount of iterations which might be useful when performance
50
-
is more important than accuracy. Disabled by default with the value -1.
51
-
52
-
### converged
53
-
54
-
Returns true when the clustering is finished, false otherwise.
55
-
56
-
### iterations
57
-
58
-
Returns the amount of iterations.
59
-
60
-
## Methods
61
-
62
-
### reset ()
63
-
64
-
Clears data points and calculated results.
65
-
66
-
### setPoints (points)
67
-
68
-
setPoints assigns an array of data points which should be clustered and calls
69
-
reset(). Use the format [{ x : x0, y : y0 }, ... , { x : xn, y: yn }].
70
-
71
-
### guessK ()
72
-
73
-
Guess the amount of clusters by the rule of thumb. (k = Math.sqrt( n * 0.5)).
74
-
See below for advice for choosing the right value for k.
75
-
76
-
### initCentroids ()
77
-
78
-
The initial centroids are selected by the k-means++ algorithm or randomly. The
79
-
latter behavior is disabled by default. k-means++ finds initial values close to
80
-
the final result, therefore, less iterations are required for the final result
81
-
usually.
82
-
83
-
### iterate ()
84
-
85
-
As k-means is an incremental algorithm, the iterate function should be called
86
-
until the centroids do not change anymore.
87
-
88
-
### cluster (callback)
89
-
90
-
Convenience function which calls the iterate() function until the algorithm has
91
-
finished.
92
-
93
-
# Tests
94
-
95
-
For the moment, you could open index.html or index-animated.html in your
-`points` (`Array`): An array-of-arrays containing the points in format `[[x1, y1, ...], [x2, y2, ...], [x3, y3, ...], ...]`
48
+
-`opts`: object containing configuration parameters. Parameters are
49
+
-`distance` (`function`): Optional function that takes two points and returns the distance between them.
50
+
-`initialize` (`Boolean`): Perform initialization. If false, uses the initial state provided in `centroids` and `assignments`. Otherwise discards any initial state and performs initialization.
51
+
-`k` (`Number`): number of centroids. If not provided, `sqrt(n / 2)` is used, where `n` is the number of points.
52
+
-`kmpp` (`Boolean`, default: `true`): If true, uses k-means++ initialization. Otherwise uses naive random assignment.
53
+
-`maxIterations` (`Number`, default: `100`): Maximum allowed number of iterations.
54
+
-`norm` (`Number`, default: `2`): L-norm used for distance computation. `1` is Manhattan norm, `2` is Euclidean norm. Ignored if `distance` function is provided.
55
+
-`centroids` (`Array`): An array of centroids. If `initialize` is false, used as initialization for the algorithm, otherwise overwritten in-place if of the correct size.
56
+
-`assignments` (`Array`): An array of assignments. Used for initialization, otherwise overwritten.
57
+
-`counts` (`Array`): An output array used to avoid extra allocation. Values are discarded and overwritten.
58
+
59
+
Returns an object containing information about the centroids and point assignments. Values are:
60
+
-`converged`: `true` if the algorithm converged successfully
61
+
-`centroids`: a list of centroids
62
+
-`counts`: the number of points assigned to each respective centroid
63
+
-`assignments`: a list of integer assignments of each point to the respective centroid
64
+
-`iterations`: number of iterations used
103
65
104
66
# Credits
105
67
@@ -108,6 +70,8 @@ browser.
108
70
for measurements and improved the random initialization by choosing from
109
71
points
110
72
73
+
*[Ricky Reusser](https://github.com/rreusser) refactored API
0 commit comments