Lessons Learned while Implementing a Sparse Logistic Regression Algorithm in Apache Spark

This repository contains the code examples for the Spark Summit EU 2017 talk and slides

For the experiments this dataset was used. (the avazu-site.tr.bz2 file)

Run any of the examples like this:

sbt runMain me.lorand.slogreg.jobs.MyExampleJob <path_to_data> <number_of_iterations>

Sparse matrix gradient descent

The optimization is implemented in me.lorand.slogreg.optimize.SparseMatrixGradientDescent

The job without a known partitioner is me.lorand.slogreg.jobs.SparseMatrixJob and the job with a known partitioner is me.lorand.slogreg.jobs.SparseMatrixPartitionerJob

Gradient descent without joins

The version without joins is implemented in me.lorand.slogreg.optimize.GradientDescent and the corresponding experiment is run by the job me.lorand.slogreg.jobs.GradientDescentJob

Gradient descent with `aggregate`

Implemented in me.lorand.slogreg.optimize.AggregateGradientDescent

Mini batch gradient descent

Also uses AggregateGradientDescent and the experiment runs in the job me.lorand.slogreg.jobs.MiniBatchJob

ADAM

me.lorand.slogreg.optimize.Adam extends AggregateGradientDescent, and the experiment is run in me.lorand.slogreg.jobs.MomentumJob

Time per iteration

Measured on an AWS EMR cluster of 5 m4.2xlarge nodes

The initial version is almost 4 minutes, the best version is half a second.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
project		project
slides		slides
src/main/scala/me/lorand/slogreg		src/main/scala/me/lorand/slogreg
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lessons Learned while Implementing a Sparse Logistic Regression Algorithm in Apache Spark

Sparse matrix gradient descent

Gradient descent without joins

Gradient descent with `aggregate`

Mini batch gradient descent

ADAM

Time per iteration

About

Releases

Packages

Languages

lorserker/sparse-logreg-spark

Folders and files

Latest commit

History

Repository files navigation

Lessons Learned while Implementing a Sparse Logistic Regression Algorithm in Apache Spark

Sparse matrix gradient descent

Gradient descent without joins

Gradient descent with aggregate

Mini batch gradient descent

ADAM

Time per iteration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Gradient descent with `aggregate`

Packages