Skip to content

Commit e6049de

Browse files
Ricky ReusserMatthias Thoemmes
Ricky Reusser
authored and
Matthias Thoemmes
committed
Refactor API
1 parent b579100 commit e6049de

19 files changed

+7588
-717
lines changed

.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
dist
2+
*.log
3+
lib-cov
4+
node_modules

.npmignore

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
examples
2+
test
3+
www

.travis.yml

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
language: node_js
2+
node_js:
3+
- "5"
4+
- "5.1"
5+
- "4"
6+
- "4.2"
7+
- "4.1"
8+
- "4.0"
9+
- "0.12"
10+
- "0.11"
11+
- "0.10"
12+
- "iojs"

README.md

+55-91
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,12 @@
1-
kmpp
2-
====
1+
# kmpp
32

4-
When dealing with lots of data points, clustering algorithms may be needed in
5-
order to group them. The k-means algorithm partitions _n_ data points into
6-
_k_ clusters and finds the centroids of these clusters incrementally.
3+
When dealing with lots of data points, clustering algorithms may be used to group them. The k-means algorithm partitions _n_ data points into _k_ clusters and finds the centroids of these clusters incrementally.
74

8-
The basic k-means algorithm is initialized with _k_ centroids at random
9-
positions.
5+
The algorithm assigns data points to the closest cluster, and the centroids of each cluster are re-calculated. These steps are repeated until the centroids do not changing anymore.
106

11-
It assigns data points to the closest cluster, the centroids of each
12-
cluster are re-calculated afterwards. These assignment/recalculating steps are
13-
repeated until the centroids are not changing anymore.
7+
The basic k-means algorithm is initialized with _k_ centroids at random positions. This implementation addresses some disadvantages of the arbitrary initialization method with the k-means++ algorithm (see "Further reading" at the end).
148

15-
This implementation addresses some disadvantages of the arbitrary
16-
initialization method with the k-means++ algorithm (see "Further reading" at the
17-
end).
9+
## Installation
1810

1911
## Installing via npm
2012

@@ -23,83 +15,53 @@ Install kmpp as Node.js module via NPM:
2315
$ npm install kmpp
2416
````
2517

26-
## Setting up a new instance
27-
28-
````js
29-
// var kmpp = require('kmpp'); /* When running in Node.js */
30-
var k = new kmpp();
31-
````
32-
33-
## Attributes
34-
35-
### kmpp [Boolean]
36-
37-
Enables or disables k-means++ initialization. If disabled, the initial
38-
centroids are selected randomly. It is recommended to leave this setting
39-
enabled, as it reduces the amount of the actual algorithm's iteration steps.
40-
41-
### k [number]
42-
43-
This value defines the amount of clusters. You can let the module guess the
44-
amount by the rule of thumb with the guessK() function. It is crucial to select
45-
an appropriate value of clusters in order to find a good solution.
46-
47-
### maxIterations [number]
48-
49-
Defines the maximum amount of iterations which might be useful when performance
50-
is more important than accuracy. Disabled by default with the value -1.
51-
52-
### converged
53-
54-
Returns true when the clustering is finished, false otherwise.
55-
56-
### iterations
57-
58-
Returns the amount of iterations.
59-
60-
## Methods
61-
62-
### reset ()
63-
64-
Clears data points and calculated results.
65-
66-
### setPoints (points)
67-
68-
setPoints assigns an array of data points which should be clustered and calls
69-
reset(). Use the format [{ x : x0, y : y0 }, ... , { x : xn, y: yn }].
70-
71-
### guessK ()
72-
73-
Guess the amount of clusters by the rule of thumb. (k = Math.sqrt( n * 0.5)).
74-
See below for advice for choosing the right value for k.
75-
76-
### initCentroids ()
77-
78-
The initial centroids are selected by the k-means++ algorithm or randomly. The
79-
latter behavior is disabled by default. k-means++ finds initial values close to
80-
the final result, therefore, less iterations are required for the final result
81-
usually.
82-
83-
### iterate ()
84-
85-
As k-means is an incremental algorithm, the iterate function should be called
86-
until the centroids do not change anymore.
87-
88-
### cluster (callback)
89-
90-
Convenience function which calls the iterate() function until the algorithm has
91-
finished.
92-
93-
# Tests
94-
95-
For the moment, you could open index.html or index-animated.html in your
96-
browser.
97-
98-
# Todo
99-
100-
* remove the dependency on jQuery
101-
* add build tools
102-
* better testing and visualization
18+
## Example
19+
20+
```javascript
21+
var kmpp = require('kmpp');
22+
23+
kmpp([
24+
[x1, y1, ...],
25+
[x2, y2, ...],
26+
[x3, y3, ...],
27+
...
28+
], {
29+
k: 4
30+
});
31+
32+
// =>
33+
// { converged: true,
34+
// centroids: [[xm1, ym1, ...], [xm2, ym2, ...], [xm3, ym3, ...]],
35+
// counts: [ 7, 6, 7 ],
36+
// assignments: [ 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ]
37+
// }
38+
```
39+
40+
## API
41+
42+
### `kmpp(points[, opts)`
43+
44+
Exectes the k-means++ algorithm on `points`.
45+
46+
Arguments:
47+
- `points` (`Array`): An array-of-arrays containing the points in format `[[x1, y1, ...], [x2, y2, ...], [x3, y3, ...], ...]`
48+
- `opts`: object containing configuration parameters. Parameters are
49+
- `distance` (`function`): Optional function that takes two points and returns the distance between them.
50+
- `initialize` (`Boolean`): Perform initialization. If false, uses the initial state provided in `centroids` and `assignments`. Otherwise discards any initial state and performs initialization.
51+
- `k` (`Number`): number of centroids. If not provided, `sqrt(n / 2)` is used, where `n` is the number of points.
52+
- `kmpp` (`Boolean`, default: `true`): If true, uses k-means++ initialization. Otherwise uses naive random assignment.
53+
- `maxIterations` (`Number`, default: `100`): Maximum allowed number of iterations.
54+
- `norm` (`Number`, default: `2`): L-norm used for distance computation. `1` is Manhattan norm, `2` is Euclidean norm. Ignored if `distance` function is provided.
55+
- `centroids` (`Array`): An array of centroids. If `initialize` is false, used as initialization for the algorithm, otherwise overwritten in-place if of the correct size.
56+
- `assignments` (`Array`): An array of assignments. Used for initialization, otherwise overwritten.
57+
- `counts` (`Array`): An output array used to avoid extra allocation. Values are discarded and overwritten.
58+
59+
Returns an object containing information about the centroids and point assignments. Values are:
60+
- `converged`: `true` if the algorithm converged successfully
61+
- `centroids`: a list of centroids
62+
- `counts`: the number of points assigned to each respective centroid
63+
- `assignments`: a list of integer assignments of each point to the respective centroid
64+
- `iterations`: number of iterations used
10365

10466
# Credits
10567

@@ -108,6 +70,8 @@ browser.
10870
for measurements and improved the random initialization by choosing from
10971
points
11072

73+
* [Ricky Reusser](https://github.com/rreusser) refactored API
74+
11175
# Further reading
11276

11377
* [Wikipedia: k-means clustering](https://en.wikipedia.org/wiki/K-means_clustering)
@@ -117,4 +81,4 @@ browser.
11781

11882
# License
11983

120-
MIT License
84+
© 2017. MIT License.

docs/index.html

+16
Large diffs are not rendered by default.

example/index.js

+192
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
var regl = require('regl');
2+
var failNicely = require('fail-nicely');
3+
var panel = require('control-panel');
4+
var hsl = require('float-hsl2rgb');
5+
var css = require('insert-css');
6+
var kmpp = require('../');
7+
var h = require('h');
8+
9+
css(`
10+
.control-panel {
11+
z-index: 100;
12+
position: relative;
13+
}
14+
15+
.progress {
16+
background: rgba(255, 255, 255, 0.8);
17+
position: absolute;
18+
bottom: 10px;
19+
left: 10px;
20+
z-index: 1;
21+
}
22+
`);
23+
24+
var progress = h('pre.progress');
25+
document.body.appendChild(progress);
26+
27+
regl({
28+
onDone: failNicely((regl) => {
29+
var pointBuf = regl.buffer();
30+
var pointColorBuf = regl.buffer();
31+
var centroidBuf = regl.buffer();
32+
var centroidColorBuf = regl.buffer();
33+
34+
var x, iteration;
35+
var km = {};
36+
var settings = {
37+
norm: 2,
38+
points: 5000,
39+
k: 0,
40+
kmpp: true,
41+
uniformity: 0.5,
42+
periodicity: 4
43+
};
44+
45+
function distribution (x, y) {
46+
var w = Math.PI * settings.periodicity;
47+
return Math.abs(
48+
Math.cos(x * w) *
49+
Math.sin((y - x / 2) * w) *
50+
Math.sin((y + x / 2) * w)
51+
);
52+
}
53+
54+
function restart () {
55+
iteration = 0;
56+
progress.textContent = '';
57+
delete km.assignments;
58+
delete km.centroids;
59+
km.converged = false;
60+
}
61+
62+
function initialize () {
63+
var ar = window.innerWidth / window.innerHeight;
64+
var i = 0;
65+
x = [];
66+
while (i < settings.points) {
67+
// Random points; we'll scale these to the viewport:
68+
var xp = 2.0 * (Math.random() - 0.5) * (ar > 1 ? ar : 1);
69+
var yp = 2.0 * (Math.random() - 0.5) * (ar > 1 ? 1 : 1 / ar);
70+
71+
if (Math.pow(distribution(xp, yp), (1.0 - settings.uniformity) * 2.0) > Math.random()) {
72+
x[i++] = [xp, yp];
73+
}
74+
}
75+
restart();
76+
}
77+
78+
var drawPoints = regl({
79+
vert: `
80+
precision mediump float;
81+
attribute vec2 xy;
82+
attribute vec3 color;
83+
uniform float size;
84+
uniform vec2 aspect;
85+
varying vec3 col;
86+
void main () {
87+
col = color;
88+
gl_Position = vec4(xy * aspect, 0, 1);
89+
gl_PointSize = size;
90+
}
91+
`,
92+
frag: `
93+
precision mediump float;
94+
uniform float alpha;
95+
varying vec3 col;
96+
uniform float size;
97+
void main () {
98+
vec2 uv = gl_PointCoord - 0.5;
99+
float r = length(uv) * size * 2.0;
100+
101+
gl_FragColor = vec4(col, alpha * smoothstep(size, size - 2.0, r));
102+
}
103+
`,
104+
depth: {enable: false},
105+
blend: {
106+
enable: true,
107+
func: {srcRGB: 'src alpha', srcAlpha: 1, dstRGB: 1, dstAlpha: 1},
108+
equation: {rgb: 'reverse subtract', alpha: 'add'}
109+
},
110+
attributes: {
111+
xy: regl.prop('xy'),
112+
color: regl.prop('color')
113+
},
114+
uniforms: {
115+
size: (ctx, props) => ctx.pixelRatio * props.size,
116+
alpha: regl.prop('alpha'),
117+
aspect: ctx => {
118+
var w = ctx.viewportWidth;
119+
var h = ctx.viewportHeight;
120+
return w / h > 1 ? [h / w, 1] : [1, w / h];
121+
}
122+
},
123+
primitive: 'points',
124+
count: (ctx, props) => props.xy._buffer.byteLength / 8
125+
});
126+
127+
panel([
128+
{label: 'norm', type: 'range', min: 0.5, max: 4, step: 0.5, initial: settings.norm},
129+
{label: 'k', type: 'range', min: 0, max: 100, step: 1, initial: settings.k},
130+
{label: 'points', type: 'range', min: 1000, max: 20000, step: 100, initial: settings.points},
131+
{label: 'uniformity', type: 'range', min: 0, max: 1, step: 0.1, initial: settings.uniformity},
132+
{label: 'periodicity', type: 'range', min: 1, max: 10, step: 0.5, initial: settings.periodicity},
133+
{label: 'kmpp', type: 'checkbox', initial: settings.kmpp},
134+
{label: 'restart', type: 'button', action: restart}
135+
], {position: 'top-left', width: 350}).on('input', (data) => {
136+
var needsInitialize = false;
137+
var needsRestart = false;
138+
if ((data.points !== settings.points) || (data.uniformity !== settings.uniformity) || (data.periodicity !== settings.periodicity)) {
139+
needsInitialize = true;
140+
} else if ((data.k !== settings.k) || (data.kmpp !== settings.kmpp)) {
141+
needsRestart = true;
142+
} else if (data.norm !== settings.norm) {
143+
km.converged = false;
144+
}
145+
Object.assign(settings, data);
146+
if (needsRestart) restart();
147+
if (needsInitialize) initialize();
148+
});
149+
150+
initialize();
151+
152+
window.addEventListener('resize', initialize, false);
153+
154+
iteration = 0;
155+
regl.frame(({tick}) => {
156+
if (km.converged) return;
157+
158+
iteration++;
159+
160+
km = kmpp(x, Object.assign({maxIterations: 1,
161+
norm: settings.norm,
162+
k: settings.k === 0 ? undefined : settings.k,
163+
kmpp: settings.kmpp
164+
}, km));
165+
166+
progress.textContent = km.converged ? ('converged after ' + iteration + ' iterations') : ('iteration: ' + iteration);
167+
168+
var colorList = new Array(km.centroids.length).fill(0).map((d, i) => hsl([i / km.centroids.length, 0.5, 0.5]));
169+
170+
pointColorBuf({data: km.assignments.map(i => colorList[i])});
171+
centroidColorBuf({data: colorList});
172+
pointBuf({data: x});
173+
centroidBuf({data: km.centroids});
174+
175+
regl.clear({color: [1, 1, 1, 1]});
176+
177+
drawPoints({
178+
xy: pointBuf,
179+
size: 5,
180+
color: pointColorBuf,
181+
alpha: 0.25 * Math.sqrt(5000 / settings.points * window.innerWidth * window.innerHeight / 600 / 600)
182+
});
183+
184+
drawPoints({
185+
xy: centroidBuf,
186+
size: 15,
187+
color: centroidColorBuf,
188+
alpha: 1.0
189+
});
190+
});
191+
})
192+
});

0 commit comments

Comments
 (0)