Machine Learning with Javascript by Stephen Grider
- ../resources
- MLKits - starter kits
- MLCasts - complete code
- 01-introduction
- plinko
- 02-algorithm
- plinko : using lodash
node index.js
- 03-tensorflow: tensorflow features for house price project
- 04-tf-app
- knn-tf-house-price: house price
- 06-gradient-descent
- regressions
- 07-vectorized
- regressions
- 08-plot
- regressions
- 09: batch stochastic gradient descent
- regressions : (linear complete)
- 10-natural-binary-classification
- regressions
- linear-regression (complete)
- logistic-regression
- regressions
- 11-multi-value-classifications
- regressions/multinominal-logistic-regression (complete)
- 12-image-recognition
- regressions/multinominal-logistic-regressions
- 13-performance-optimization
- regressions/multinominal-logistic-regressions (image-recognition complete)
- 14-loadcsv
- loadcsv: csv loading project
Click to Contract/Expend
- Identify the independent and dependent variables
Featuresare categories of data points that affect the value of alabel
- Assemble a set of data related to the problem you're trying to solve
- Datasets almost always cleanup of formatting
- Decide on the type of output you are predicting
Regressionused with continuous values, classification used with descrete values
- Based on type of output, pick an algorithm that will determine a correlation between your
featuresandlabels- Many, many different algorithms exist, each with pros and cons
- Use model generated by algorithm to make a prediction
- Models relate the value of
featuresto the value oflabels
- Models relate the value of
- Classification : fewer or two options
- The value of our labels belong to a discrete set
- Regression : Arrange
- The value of our labels belong to a continuous set
| Feature | Column |
|---|---|
| Drop Position | Bucket a ball lands in |
| Ball Bounciness | |
| Ball Size |
Change one of Features -> Will probably change Column
- we choose
Classification-> Bucket #1 ~ #10 - Algorithm: K-Nearest Neighbor (knn)
- "Birds of a feather flock together"
N-Nearest Neighbor (with one independent variable)
- Adjust the parameters of the analysis
const k = 3; - Add more features to explain the analysis
- Change the prediction point
- Accept that maybe there isn't a good correlation
Start "2. Add more features to explain the analysis"
Pythagorean Theorem : a^2 + b^2 = c^2
Pythagorean Theorem
C = (A ** 2 + B ** 2) ** 0.5
const outputs = [
[40, 0.5, 16, 1],
[150, 0.52, 16, 2],
[350, 0.55, 16, 2],
[425, 0.53, 16, 3]
];
const target = [323, 0.52, 16, 2];
C ** 2 = A ** 2 + B ** 2
C = (A ** 2 + B ** 2) ** 0.5
C = ((350 - 323) ** 2) + ((0.55 - 0.52) ** 2) ** 0.53D Pythagorean Theorem : a^2 + b^2 = c^2
D = (A ** 2 + B ** 2 + C ** 2) ** 0.5
Normalized Dataset = (FeatureValue - min) / (max - min)
Not all features give us a good guess.
Some features are not giving us good accuracy
the length of data: 1596
score.js:47 For feature of 0 accuracy: 0.32
score.js:47 For feature of 1 accuracy: 0.15
score.js:47 For feature of 2 accuracy: 0.03dropPosition is a good selection feature
bounciness and size are not
- Features vs Labels
- Test vs Trainnig sets of data
- Feature Normalization
- Common data structures (arrays of arrays)
- Feature Selection
- Pros
- Methods for just about everything we need
- Excellent API design (especially chain!)
- Skills transferrable to other JS projects
- Cons
- Extremely slow (relatively)
- Not 'numbers' focused
- Some things are awkward (getting a column of values)
- Pros
- Similar API to Lodash
- Extremely fast for numeric calculations
- Has a 'low level' linear algebra API + higher level API for ML
- Similar api to numpy - popular Python numerical lib
- Cons
- Still in active development
[]: 1 Dimentional[[]]: 2 Dimentional[[[]]]: 3 Dimentional
// 1 dimentional
[5, 10, 17].length -> [3] shape
// 2 dimentional
[
[5, 10, 17],
[5, 10, 17].length,
].length -> [2, 3] shape
// 3 dimentional
[
[
[5, 10, 17].length,
].length
].length -> [1, 1, 3] shape2D is the most important dimention we will work with
[# rows, # columns] -> [2, 3]
Brodcasting works when
-
Take shape of both tensor -> From right to left, the shapes are equal or one is '1'
-
Shape[3] and Shape[1] => O
- [3]
- [1]
-
Shape[2, 3] and Shape [2, 1] => O
- [2, 3]
- [2, 1]
-
Shape[2, 3, 2] and Shape [3, 1] => O
- [2, 3, 2]
- [ , 3, 1]
-
Shape[2, 3, 2] and Shape [2, 1] => X
- [2, 3, 2]
- [ , 2, 1]
- Which bucket will a ball go into? -> Classification
- What is the price of a house? -> Regressions
- Find distance between features and prediction point
- Sort from lowest point to greatest
- Take the top K records
- Average the label value of those top K records
// plinko
// features and bucket were in the same structure
[
[350, 0.55, 16, 2],
[350, 0.55, 16, 2]
];
// house-price
// features and labels are separated
const features = [
[84, 83],
[84.1, 85]
];
const housePrice = [[200], [250]];distance = ((lon - lon) ** 2 + (lat - lat) ** 2) ** 0.5
- tf.unstack
- make the tensor data to a normal javascript array
npm install --save @tensorflow/tfjs-node lodash shuffle-seedInitial analysis
Error: 15% Guess: 925420 , Expected 1085000
Error: -36% Guess: 636235 , Expected 466800
Error: -11% Guess: 472810 , Expected 425000
Error: -23% Guess: 695514.3 , Expected 565000
Error: 21% Guess: 600730 , Expected 759000
Error: -12% Guess: 573287.2 , Expected 512031
Error: -1% Guess: 773849.5 , Expected 768000
Error: 75% Guess: 381626.2 , Expected 1532500
Error: -199% Guess: 613175 , Expected 204950
Error: -71% Guess: 423569.9 , Expected 247000for exmaple, if one value is extremely high or low
Normalization wouldn't mean much.
Then standardization would be a better option
(Value - Aaverage) / StandardDeviation
StandardDeviation = sqrt(variance)
StandardDeviation = variance ** 0.5
# dataColumns: ['lat', 'long', 'sqft_lot'],
Error: -15% Guess: 1245050 , Expected 1085000
Error: -64% Guess: 765837.1 , Expected 466800
Error: -100% Guess: 848675 , Expected 425000
Error: -38% Guess: 781742 , Expected 565000
Error: -3% Guess: 781470 , Expected 759000
Error: 0% Guess: 514000 , Expected 512031
Error: -6% Guess: 814785 , Expected 768000
Error: 49% Guess: 774700 , Expected 1532500
Error: -19% Guess: 243402.5 , Expected 204950
Error: 2% Guess: 242865 , Expected 247000node --inspect-brk index.jsAnd navigate about:inspect on the browser\
We can inspect the code using breaking points and console
features.sub(mean).div(variance.pow(0.5)).print();
features
.sub(mean)
.div(variance.pow(0.5))
// .sub(predictionPoint)
.sub(scaledPrediction)
.pow(2)
.sum(1)
.pow(0.5)
.print();# tremendous improvement!
# dataColumns: ['lat', 'long', 'sqft_lot', 'sqft_living'],
node index.js
Error: -15% Guess: 1251260 , Expected 1085000
Error: -11% Guess: 519756.5 , Expected 466800
Error: -2% Guess: 433700 , Expected 425000
Error: 19% Guess: 455800 , Expected 565000
Error: 8% Guess: 699750 , Expected 759000
Error: -14% Guess: 584260 , Expected 512031
Error: -9% Guess: 835450 , Expected 768000
Error: 13% Guess: 1329790 , Expected 1532500
Error: -36% Guess: 279422.5 , Expected 204950
Error: 7% Guess: 228767.5 , Expected 247000- Pros
- Fast! Only train one time, then use for any prediction
- Uses methods that will be very important in more complicated ML
- Cons
- Lot harder to understand intuitively
price = 200 * Lot Size + 3000
you can create a chart and add a trend line based on the base (Use Equiation)
But that's for only one independent variable - dependent variable
With linear regression, we can use arbitrary numbers of independent variable to one output
- Ordinary Least Squares
- Generalized Least Squares
- ...others
- Gradient Descent
- \mathrm{MSE} = mean squared error
- {n} = number of data points
- Y_{i} = observed values
- \hat{Y}_{i} = predicted values
- bad guess:
y = 0x + 1 - How wrong were we?
- Mean Squared Error
((1-200)**2 + (1-230)**2 + (1-245)**2 + (1-274)**2 + (1-259)**2 + (1-262)**2) / 6- 360792 / 6 = 60132
- Mean Squared Error
- better guess:
y = 0x + 200- Mean Squared Error
((200-200)**2 + (200-230)**2 + (200-245)**2 + (200-274)**2 + (200-259)**2 + (200-262)**2) / 6- 15726 / 6 = 2621
- Mean Squared Error
Price = m * Lot Size + b- 'm' and 'b' will be as correct as they can be when MSE is as low as possible
We need to find the lowerest MSE
-
Price = m * Lot Size + b
- Don't know the possible range of b
- Don't know a step size for incrementing b
- Huge computational demands when adding in more features
- Wolfram Alpha - Computational Intelligence
y = x^2 + 5
- search
derivative x^2 + 5

derivative x^2 + 5: y value means slope
- Pick a value for 'b'
- Calculate the slope of MSE with b : derivative
- Is the slope very, very small? If yes, we are done!
- Multiply the slope by an arbitrary small value called a 'learning rate'
- Subtract that from 'b'
- Go back to 2
[Gradient Descent] Sheet on MSE graph.xlsx
- Why worry about derivatives? Just calculate MSE twice and compare the two values
- by
Slope of MSEis already doing that calculation
- by
- We want slope of 0, so why not set the derivative equial to 0 and solve for b?
- Pick a value for 'b' and 'm'
- Calculate the slope of MSE with respect to 'm' and 'b': derivative
- Is the slope very, very small? If yes, we are done!
- Multiply the slope by an arbitrary small value called a 'learning rate'
- Subtract that from 'b' and 'm'
- Go back to 2
Miles Per Gallon = m * (Car Horsepower) + b
Linear Algebra operation between two matrices(=tensor)
- Are two matrices eligible to be multipled together?
- What's the output of matrix multiplication
- How is matrix multiplication done?
- shape [4, 2] and shape [2, 3]
- Inner shape values are the same -> Eligible for matrix multiplication
- [4, 3]
matrix_a = [
[1, 5],
[2, 6],
[3, 7],
[4, 8]
];
matrix_b = [
[10, 30, 50],
[20, 40, 60]
];Matrix C
| 1*10 + 5*20 = 110 | 30 + 200 = 230 | 350 |
| 2*10 + 6*20 = 140 | 60 + 240 = 300 | 460 |
| 170 | 90 + 280 = 370 | 570 |
| 200 | 120 + 320 = 440 | 680 |
- Slope of MSE with respect to M and B:
(Features * ((Features * Weights) - Labels)) / n - Labels: Tensor of our label data
- Features: Tensor of our feature data
- n: Number of observations
- Weights: M and B in a tensor
Engine HorsePower = [
// [engine horse, arbitrary column of 1's]
[x1, 1],
[x2, 1],
[x3, 1],
[x4, 1],
[x5, 1],
[x6, 1],
] // shape [6, 2]
Weights = [
[m],
[b]
] // [2, 1]| -> [6, 1] |
|---|
| m * x1 + b |
| m * x2 + b |
| m * x3 + b |
| m * x4 + b |
| m * x5 + b |
| m * x6 + b |
// 1.
Transposed Engine HorsePower = [
[x1, x2, x3, x4, x5, x6],
[ 1, 1, 1, 1, 1, 1]
] // [2, 6]
// 2. [6, 1]
differences[6] = "resultOf87".sub(actual[6])
// d = differenceMatrix multification
| -> [2, 1] |
|---|
| x1 * d1 + x2 * d2 + x3 * d3 + x4 * d4 + x5 * d5 + x6 * d6 |
| d1 + d2 + d3 + d4 + d5 + d6 |
| it means |
|---|
| mSlope = .sum(horsepower * difference) |
| bSlope = .sum(difference) |
- Refactor constructor to make 'features' and 'labels' into tensors
- Append a column of one's to the feature tensor
- Make a tensor for our weights as well
- Refactor 'gradientDescent' function to use the new equation
Google it : Vectorized of gradient descent in linear regression
Coefficient of Determination
R ** 2 = 1 - (SS(res) / SS(tot))- SS(tot): Total sum of squares, (Actual + Average) ** 2
- SS(res): Sum of squares of residuals, (Actual + Predicted) ** 2
node index.js
# R2 is -3.0282658720681175node index.js
# R2 is -10.938349176819127node index.js
# R2 is 0.6048547748640769MPG = b + (m1 * Weight) + (m2 * Displacement) + (m3 * Horsepower)
- Univariate Linear Regressions:
y = b + (m * x) - Multivariate Linear Regressions:
y = b + (m1 * x1) + (m2 * x2) + (m3 * x3)
learningRate: 1-> R2 is -InfinitylearningRate: 0.01-> R2 is -0.8926304296686307learningRate: 0.5-> R2 is 0.658514569203041learningRate: 0.1-> R2 is 0.6609495536468749iterations: 1000-> R2 is 0.6581457923927724iterations: 100-> R2 is 0.6609495536468749
- Adam
- Adagrad
- RMSProp
- Momentum
however, those methods above are a bit too complicated for our case
- With every interation of GD, calculate the exact value of MSE and store it
- After running an iteration of GD, look at the current MSE and the old MSE
- If the MSE went up then we did a bad update, so divide learning rate by 2
- If the MSE went down then we are going in the right direction! Increase LR by 5%
now, the initial learningRate does not matter as it will be adjusted during training
the plot will help us to easily figure how many iterations would be enough
- Gradient Descent: [6, 4]
- Use entire feature set to update M and B
- Batch Gradients Descent: [3, 4]
- Use a couple observations at a time to update M and B
- Stochastic Gradient Descent: [1, 4]
- Use one observation at a time to update M and B
if there are a lot more data, we can more the performance improvement more clearly
// Batch Gradients Descent
{
"iterations": 3,
"batchSize": 10
}// Stochastic Gradients Descent
{
"iterations": 3,
"batchSize": 1
}- Linear Regression: Predicts continuous values
- Logistic Regression: Predicts descrete values (classficiation)
- Basic logistic regression - Binary classification - pass / not pass - spam / not spam - customer accepts / declines - apple phone / android phone
e = Euler's number = 2.71828...
- Encode label values as either '0' or '1'
- Guess a starting value of B and M (and M2, M3, etc)
- Calculate the slope of MSE using all observations in feature set and current M/B values
- Multiply the slope by learning rate
- Update B and M
- Go back to 3
- Metric of how bad we guessed (Cross Entropy)

- Actual: Encoded label value
- Guess: Our guess, sigmoid(mx + b)
- n: Number of observations
- MSE has one minimum value
- Given a person's number of hours spent driving per day, what type of car do they prefer?
- Luxury
- Sedan
- Truc
- Compact
node index.js
# [[1, 1, 0],]- Sigmoid : Marginal Probability Distribution (this is what currently happens)
- Considers one possible output case in isolation
- Softmax : Conditional Probability Distribution (e.g. rolling a dice)
- Considers all possible output cases together
- npm mnist-data
- total 60,000 images
- 28 x 28 image data in pixel
node index.js
# Accuracy is 0.08the accuracy is very disappointing
node --inspect-brk index.js
- Remove the columns with only zeros from our feature set - they don't provide any benefit anyways
- Change our method of standardization/normalization to better account for possible all zero values
variance.cast('bool').logicalNot().cast('float32').print();node index.js
# Accuracy is 0.88node index.js
# Accuracy is 0.867
# if it reaches the heap size limit
node --max-old-space-size=4096 index.js
node
> v8.getHeapStatistics()
{
...,
heap_size_limit: 4345298944,
...
}node --inspect-brk memory.js- Chrome Inspect -> Memory -> Heap snapshot -> Take snapshot
The shallow size of the array is reduced from 9,108k to 850k
-> because of the Javascript Garbage Collector
- Shallow memory (array): the actual memory occupied
- Retained memory (Array): about the reference
Hmm.. there must be some changes in TensorFlow library
Tensor doesn't use too much memory which is different against the lecture
tf.ENV.registry.webgl.backend.textData.data;
// Can't find it in my version, "4.1.0"Well, there's not much difference.
I guess TensorFlow already optimizes them now
node index.js
# Accuracy is 0.9257- Node js debugging using chrome:
node --inspect-brk index.js- navigate
about:inspectorchrome://inspect


















