Machine Learning with Javascript

Machine Learning with Javascript by Stephen Grider

Folder structure

../resources
- MLKits - starter kits
- MLCasts - complete code
01-introduction
- plinko
02-algorithm
- plinko : using lodash
- node index.js
03-tensorflow: tensorflow features for house price project
04-tf-app
- knn-tf-house-price: house price
06-gradient-descent
- regressions
07-vectorized
- regressions
08-plot
- regressions
09: batch stochastic gradient descent
- regressions : (linear complete)
10-natural-binary-classification
- regressions
  - linear-regression (complete)
  - logistic-regression
11-multi-value-classifications
- regressions/multinominal-logistic-regression (complete)
12-image-recognition
- regressions/multinominal-logistic-regressions
13-performance-optimization
- regressions/multinominal-logistic-regressions (image-recognition complete)
14-loadcsv
- loadcsv: csv loading project

Details

Click to Contract/Expend

Section 1: What is Machine Learning?

2. Course Resources

View diagrams

5. A Complete Walkthrough

Problem solving process

Identify the independent and dependent variables
- Features are categories of data points that affect the value of a label
Assemble a set of data related to the problem you're trying to solve
- Datasets almost always cleanup of formatting
Decide on the type of output you are predicting
- Regression used with continuous values, classification used with descrete values
Based on type of output, pick an algorithm that will determine a correlation between your features and labels
- Many, many different algorithms exist, each with pros and cons
Use model generated by algorithm to make a prediction
- Models relate the value of features to the value of labels

Methods

Classification : fewer or two options
- The value of our labels belong to a discrete set
Regression : Arrange
- The value of our labels belong to a continuous set

8. Identifying Relevant Data

Feature	Column
Drop Position	Bucket a ball lands in
Ball Bounciness
Ball Size

Change one of Features -> Will probably change Column

11. What Type of Problem?

we choose Classification -> Bucket #1 ~ #10
Algorithm: K-Nearest Neighbor (knn)
- "Birds of a feather flock together"

12. How K-Nearest Neighbor Works

N-Nearest Neighbor (with one independent variable)

13. Lodash Review

lodash doc

17. Interpreting Bad Results

Adjust the parameters of the analysis const k = 3;
Add more features to explain the analysis
Change the prediction point
Accept that maybe there isn't a good correlation

25. Updating KNN for Multiple Features

Start "2. Add more features to explain the analysis"

Pythagorean Theorem : a^2 + b^2 = c^2

26. Multi-Dimensional KNN

Pythagorean Theorem
C = (A ** 2 + B ** 2) ** 0.5

const outputs = [
  [40, 0.5, 16, 1],
  [150, 0.52, 16, 2],
  [350, 0.55, 16, 2],
  [425, 0.53, 16, 3]
];
const target = [323, 0.52, 16, 2];

C ** 2 = A ** 2 + B ** 2
C = (A ** 2 + B ** 2) ** 0.5
C = ((350 - 323) ** 2) + ((0.55 - 0.52) ** 2) ** 0.5

3D Pythagorean Theorem : a^2 + b^2 = c^2 D = (A ** 2 + B ** 2 + C ** 2) ** 0.5

30. Feature Normalization

Normalized Dataset = (FeatureValue - min) / (max - min)

33. Feature Selection with KNN

Not all features give us a good guess.
Some features are not giving us good accuracy

35. Evaluating Different Feature Values

the length of data:  1596
score.js:47 For feature of 0 accuracy: 0.32
score.js:47 For feature of 1 accuracy: 0.15
score.js:47 For feature of 2 accuracy: 0.03

dropPosition is a good selection feature
bounciness and size are not

Section 3: Onwards to Tensorflow JS!

36. Let's Get Our Bearings

Features vs Labels
Test vs Trainnig sets of data
Feature Normalization
Common data structures (arrays of arrays)
Feature Selection

Lodash

Pros
- Methods for just about everything we need
- Excellent API design (especially chain!)
- Skills transferrable to other JS projects
Cons
- Extremely slow (relatively)
- Not 'numbers' focused
- Some things are awkward (getting a column of values)

Tensorflow JS

Pros
- Similar API to Lodash
- Extremely fast for numeric calculations
- Has a 'low level' linear algebra API + higher level API for ML
- Similar api to numpy - popular Python numerical lib
Cons
- Still in active development

38. Tensor Shape and Dimension

TesorFlow.js Doc

[] : 1 Dimentional
[[]] : 2 Dimentional
[[[]]] : 3 Dimentional

Shape

// 1 dimentional
[5, 10, 17].length -> [3] shape

// 2 dimentional
[
  [5, 10, 17],
  [5, 10, 17].length,
].length -> [2, 3] shape

// 3 dimentional
[
  [
    [5, 10, 17].length,
  ].length
].length -> [1, 1, 3] shape

2D is the most important dimention we will work with
[# rows, # columns] -> [2, 3]

41. Broadcasting Operations

Brodcasting works when

Take shape of both tensor -> From right to left, the shapes are equal or one is '1'
Shape[3] and Shape[1] => O
- [3]
- [1]
Shape[2, 3] and Shape [2, 1] => O
- [2, 3]
- [2, 1]
Shape[2, 3, 2] and Shape [3, 1] => O
- [2, 3, 2]
- [ , 3, 1]
Shape[2, 3, 2] and Shape [2, 1] => X
- [2, 3, 2]
- [ , 2, 1]

Section 4: Applications of Tensorflow

49. KNN with Regression

Which bucket will a ball go into? -> Classification
What is the price of a house? -> Regressions

KNN Algorithm

Find distance between features and prediction point
Sort from lowest point to greatest
Take the top K records
Average the label value of those top K records

50. A Change in Data Structure

Differences between plinko and house-price

// plinko
// features and bucket were in the same structure
[
  [350, 0.55, 16, 2],
  [350, 0.55, 16, 2]
];

// house-price
// features and labels are separated
const features = [
  [84, 83],
  [84.1, 85]
];
const housePrice = [[200], [250]];

51. KNN with Tensorflow

distance = ((lon - lon) ** 2 + (lat - lat) ** 2) ** 0.5

53. Sorting Tensors

tf.unstack
make the tensor data to a normal javascript array

55. Moving to the Editor

npm install --save @tensorflow/tfjs-node lodash shuffle-seed

58. Reporting Error Percentages

Initial analysis

Error: 15% Guess: 925420 , Expected 1085000
Error: -36% Guess: 636235 , Expected 466800
Error: -11% Guess: 472810 , Expected 425000
Error: -23% Guess: 695514.3 , Expected 565000
Error: 21% Guess: 600730 , Expected 759000
Error: -12% Guess: 573287.2 , Expected 512031
Error: -1% Guess: 773849.5 , Expected 768000
Error: 75% Guess: 381626.2 , Expected 1532500
Error: -199% Guess: 613175 , Expected 204950
Error: -71% Guess: 423569.9 , Expected 247000

59. Normalization or Standardization?

for exmaple, if one value is extremely high or low
Normalization wouldn't mean much.
Then standardization would be a better option

60. Numerical Standardization with Tensorflow

(Value - Aaverage) / StandardDeviation

StandardDeviation = sqrt(variance)
StandardDeviation = variance ** 0.5

61. Applying Standardization

# dataColumns: ['lat', 'long', 'sqft_lot'],
Error: -15% Guess: 1245050 , Expected 1085000
Error: -64% Guess: 765837.1 , Expected 466800
Error: -100% Guess: 848675 , Expected 425000
Error: -38% Guess: 781742 , Expected 565000
Error: -3% Guess: 781470 , Expected 759000
Error: 0% Guess: 514000 , Expected 512031
Error: -6% Guess: 814785 , Expected 768000
Error: 49% Guess: 774700 , Expected 1532500
Error: -19% Guess: 243402.5 , Expected 204950
Error: 2% Guess: 242865 , Expected 247000

62. Debugging Calculations

node --inspect-brk index.js

And navigate about:inspect on the browser\

We can inspect the code using breaking points and console

features.sub(mean).div(variance.pow(0.5)).print();
features
  .sub(mean)
  .div(variance.pow(0.5))
  // .sub(predictionPoint)
  .sub(scaledPrediction)
  .pow(2)
  .sum(1)
  .pow(0.5)
  .print();

63. What Now?

# tremendous improvement!
# dataColumns: ['lat', 'long', 'sqft_lot', 'sqft_living'],
node index.js
Error: -15% Guess: 1251260 , Expected 1085000
Error: -11% Guess: 519756.5 , Expected 466800
Error: -2% Guess: 433700 , Expected 425000
Error: 19% Guess: 455800 , Expected 565000
Error: 8% Guess: 699750 , Expected 759000
Error: -14% Guess: 584260 , Expected 512031
Error: -9% Guess: 835450 , Expected 768000
Error: 13% Guess: 1329790 , Expected 1532500
Error: -36% Guess: 279422.5 , Expected 204950
Error: 7% Guess: 228767.5 , Expected 247000

Section 5: Getting Started with Gradient Descent

64. Linear Regression

Pros
- Fast! Only train one time, then use for any prediction
- Uses methods that will be very important in more complicated ML
Cons
- Lot harder to understand intuitively

65. Why Linear Regression?

price = 200 * Lot Size + 3000

in Google doc,

you can create a chart and add a trend line based on the base (Use Equiation)
But that's for only one independent variable - dependent variable

With linear regression, we can use arbitrary numbers of independent variable to one output

66. Understanding Gradient Descent

Methods of Solving linear regression

Ordinary Least Squares
Generalized Least Squares
...others
Gradient Descent

Mean Squared Error (MSE)

\mathrm{MSE} = mean squared error
{n} = number of data points
Y_{i} = observed values
\hat{Y}_{i} = predicted values

Let's guess

bad guess: y = 0x + 1
How wrong were we?
- Mean Squared Error
  - ((1-200)**2 + (1-230)**2 + (1-245)**2 + (1-274)**2 + (1-259)**2 + (1-262)**2) / 6
  - 360792 / 6 = 60132
better guess: y = 0x + 200
- Mean Squared Error
  - ((200-200)**2 + (200-230)**2 + (200-245)**2 + (200-274)**2 + (200-259)**2 + (200-262)**2) / 6
  - 15726 / 6 = 2621

What's a good guess?

Price = m * Lot Size + b
'm' and 'b' will be as correct as they can be when MSE is as low as possible

67. Guessing Coefficients with MSE

MSE graph.xlsx

We need to find the lowerest MSE

Issues with this approach

- Price = m * Lot Size + b
Don't know the possible range of b
Don't know a step size for incrementing b
Huge computational demands when adding in more features

69. Derivatives!

Wolfram Alpha - Computational Intelligence
y = x^2 + 5
search derivative x^2 + 5
derivative x^2 + 5: y value means slope

70. Gradient Descent in Action

Pick a value for 'b'
Calculate the slope of MSE with b : derivative
Is the slope very, very small? If yes, we are done!
Multiply the slope by an arbitrary small value called a 'learning rate'
Subtract that from 'b'
- Go back to 2

72. Why a Learning Rate?

[Gradient Descent] Sheet on MSE graph.xlsx

73. Answering Common Questions

Why worry about derivatives? Just calculate MSE twice and compare the two values
- by Slope of MSE is already doing that calculation
We want slope of 0, so why not set the derivative equial to 0 and solve for b?

74. Gradient Descent with Multiple Terms

Pick a value for 'b' and 'm'
Calculate the slope of MSE with respect to 'm' and 'b': derivative
Is the slope very, very small? If yes, we are done!
Multiply the slope by an arbitrary small value called a 'learning rate'
Subtract that from 'b' and 'm'
- Go back to 2

Section 6: Gradient Descent with Tensorflow

76. Project Overview

Miles Per Gallon = m * (Car Horsepower) + b

84. Matrix Multiplication

Linear Algebra operation between two matrices(=tensor)

Are two matrices eligible to be multipled together?
What's the output of matrix multiplication
How is matrix multiplication done?

For example

shape [4, 2] and shape [2, 3]
- Inner shape values are the same -> Eligible for matrix multiplication
- [4, 3]

85. More on Matrix Multiplication

matrix_a = [
  [1, 5],
  [2, 6],
  [3, 7],
  [4, 8]
];
matrix_b = [
  [10, 30, 50],
  [20, 40, 60]
];

Matrix C


110 + 520 = 110	30 + 200 = 230	350
210 + 620 = 140	60 + 240 = 300	460
170	90 + 280 = 370	570
200	120 + 320 = 440	680

86. Matrix Form of Slope Equations

Slope of MSE with respect to M and B: (Features * ((Features * Weights) - Labels)) / n
Labels: Tensor of our label data
Features: Tensor of our feature data
n: Number of observations
Weights: M and B in a tensor

87. Simplification with Matrix Multiplication

Engine HorsePower = [
  // [engine horse, arbitrary column of 1's]
  [x1, 1],
  [x2, 1],
  [x3, 1],
  [x4, 1],
  [x5, 1],
  [x6, 1],
] // shape [6, 2]

Weights = [
  [m],
  [b]
] // [2, 1]

-> [6, 1]
m * x1 + b
m * x2 + b
m * x3 + b
m * x4 + b
m * x5 + b
m * x6 + b

88. How it All Works Together!

// 1.
Transposed Engine HorsePower = [
  [x1, x2, x3, x4, x5, x6],
  [ 1,  1,  1,  1,  1,  1]
] // [2, 6]

// 2. [6, 1]
differences[6] = "resultOf87".sub(actual[6])
// d = difference

Matrix multification

-> [2, 1]
x1 * d1 + x2 * d2 + x3 * d3 + x4 * d4 + x5 * d5 + x6 * d6
d1 + d2 + d3 + d4 + d5 + d6

it means
mSlope = .sum(horsepower * difference)
bSlope = .sum(difference)

Section 7: Increasing Performance with Vectorized Solutions

89. Refactoring the Linear Regression Class

Refactor constructor to make 'features' and 'labels' into tensors
Append a column of one's to the feature tensor
Make a tensor for our weights as well
Refactor 'gradientDescent' function to use the new equation

91. A Few More Changes

Google it : Vectorized of gradient descent in linear regression

93. Calculating Model Accuracy

Coefficient of Determination

R ** 2 = 1 - (SS(res) / SS(tot))
SS(tot): Total sum of squares, (Actual + Average) ** 2
SS(res): Sum of squares of residuals, (Actual + Predicted) ** 2

95. Dealing with Bad Accuracy

node index.js
# R2 is -3.0282658720681175

98. Reapplying Standardization

node index.js
# R2 is -10.938349176819127

100. Massaging Learning Rates

node index.js
# R2 is 0.6048547748640769

101. Moving Towards Multivariate Regression

MPG = b + (m1 * Weight) + (m2 * Displacement) + (m3 * Horsepower)

Univariate Linear Regressions: y = b + (m * x)
Multivariate Linear Regressions: y = b + (m1 * x1) + (m2 * x2) + (m3 * x3)

102. Refactoring for Multivariate Analysis

learningRate: 1 -> R2 is -Infinity
learningRate: 0.01 -> R2 is -0.8926304296686307
learningRate: 0.5 -> R2 is 0.658514569203041
learningRate: 0.1 -> R2 is 0.6609495536468749
iterations: 1000 -> R2 is 0.6581457923927724
iterations: 100 -> R2 is 0.6609495536468749

103. Learning Rate Optimization

Some of existing methods to help to adjust learning rate

Adam
Adagrad
RMSProp
Momentum

Custom Learning Rate Optimization

however, those methods above are a bit too complicated for our case

With every interation of GD, calculate the exact value of MSE and store it
After running an iteration of GD, look at the current MSE and the old MSE
If the MSE went up then we did a bad update, so divide learning rate by 2
If the MSE went down then we are going in the right direction! Increase LR by 5%

104. Recording MSE History

Section 8: Plotting Data with Javascript

106. Observing Changing Learning Rate and MSE

now, the initial learningRate does not matter as it will be adjusted during training

107. Plotting MSE Values

the plot will help us to easily figure how many iterations would be enough

Section 9: Gradient Descent Alterations

109. Batch and Stochastic Gradient Descent

Gradient Descent: [6, 4]
- Use entire feature set to update M and B
Batch Gradients Descent: [3, 4]
- Use a couple observations at a time to update M and B
Stochastic Gradient Descent: [1, 4]
- Use one observation at a time to update M and B

113. Evaluating Batch Gradient Descent Results

if there are a lot more data, we can more the performance improvement more clearly

// Batch Gradients Descent
{
  "iterations": 3,
  "batchSize": 10
}

// Stochastic Gradients Descent
{
  "iterations": 3,
  "batchSize": 1
}

Section 10: Natural Binary Classification

115. Introducing Logistic Regression

Linear Regression: Predicts continuous values
Logistic Regression: Predicts descrete values (classficiation)
- Basic logistic regression - Binary classification - pass / not pass - spam / not spam - customer accepts / declines - apple phone / android phone

118. The Sigmoid Equation

e = Euler's number = 2.71828...

120. Changes for Logistic Regression

Encode label values as either '0' or '1'
Guess a starting value of B and M (and M2, M3, etc)
Calculate the slope of MSE using all observations in feature set and current M/B values
Multiply the slope by learning rate
Update B and M
- Go back to 3

125. Updating Linear Regression for Logistic Regression

Metric of how bad we guessed (Cross Entropy)
Actual: Encoded label value
Guess: Our guess, sigmoid(mx + b)
n: Number of observations

131. Mean Squared Error vs Cross Entropy

MSE has one minimum value

132. Refactoring with Cross Entropy

Section 11: Multi-Value Classification

135. Multinominal Logistic Regression

Multiple classification options

Given a person's number of hours spent driving per day, what type of car do they prefer?
- Luxury
- Sedan
- Truc
- Compact

143. Marginal vs Conditional Probability

node index.js
# [[1, 1, 0],]

Sigmoid : Marginal Probability Distribution (this is what currently happens)
- Considers one possible output case in isolation
Softmax : Conditional Probability Distribution (e.g. rolling a dice)
- Considers all possible output cases together

Section 12: Image Recognition In Action

151. Flattening Image Data

npm mnist-data
total 60,000 images
28 x 28 image data in pixel

154. Unchanging Accuracy

node index.js
# Accuracy is 0.08

the accuracy is very disappointing

155. Debugging the Calculation Process

node --inspect-brk index.js

156. Dealing with Zero Variances

Possible Solutions

Remove the columns with only zeros from our feature set - they don't provide any benefit anyways
Change our method of standardization/normalization to better account for possible all zero values

variance.cast('bool').logicalNot().cast('float32').print();

157. Backfilling Variance

node index.js
# Accuracy is 0.88

Section 13: Performance Optimization

c58. Handing Large Datasets

node index.js
# Accuracy is 0.867

# if it reaches the heap size limit
node --max-old-space-size=4096 index.js
node
> v8.getHeapStatistics()
{
  ...,
  heap_size_limit: 4345298944,
  ...
}

159. Minimizing Memory Usage

node --inspect-brk memory.js

Chrome Inspect -> Memory -> Heap snapshot -> Take snapshot

160. Creating Memory Snapshots

When `return randoms;`

When doesn't `return randoms;`

The shallow size of the array is reduced from 9,108k to 850k
-> because of the Javascript Garbage Collector

161. The Javascript Garbage Collector

162. Shallow vs Retained Memory Usage

Shallow memory (array): the actual memory occupied
Retained memory (Array): about the reference

163. Measuring Memory Usage

Before refactoring loading Data

165. Measuring Footprint Reduction

After refactoring loading Data

167. Tensorflow's Eager Memory Usage

Before refactoring TensorFlow with tidy

Hmm.. there must be some changes in TensorFlow library
Tensor doesn't use too much memory which is different against the lecture

tf.ENV.registry.webgl.backend.textData.data;
// Can't find it in my version, "4.1.0"

169. Implementing TF Tidy

Before refactoring TensorFlow with tidy

Well, there's not much difference.
I guess TensorFlow already optimizes them now

178. Improving Model Accuracy

node index.js
# Accuracy is 0.9257

Tips

Node js debugging using chrome:
- node --inspect-brk index.js
- navigate about:inspect or chrome://inspect

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
01-introduction/plinko		01-introduction/plinko
02-algorithm		02-algorithm
03-tensorflow/tf-playground		03-tensorflow/tf-playground
04-tf-app		04-tf-app
06-gradient-descent-tf/regressions		06-gradient-descent-tf/regressions
07-vectorized-solution/regressions		07-vectorized-solution/regressions
08-plot/regressions		08-plot/regressions
09-gradient-descent-alterations/regressions		09-gradient-descent-alterations/regressions
10-natual-binary-classification/regressions		10-natual-binary-classification/regressions
11-multi-value-classification/regressions		11-multi-value-classification/regressions
12-image-recognition/regressions		12-image-recognition/regressions
13-performance-optimization/regressions		13-performance-optimization/regressions
14-loadcsv/loadcsv		14-loadcsv/loadcsv
resources		resources
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation