Skip to content

Commit 5119631

Browse files
author
joyjxu
committedAug 27, 2019
Merge remote-tracking branch 'upstream/master'
2 parents 0a0d3d6 + dd990ae commit 5119631

File tree

17 files changed

+434
-433
lines changed

17 files changed

+434
-433
lines changed
 

‎README.md

+3-66
Original file line numberDiff line numberDiff line change
@@ -48,71 +48,7 @@ Figure 4 provides an example of running distributed machine learning algorithms
4848

4949
## Quick Start
5050
SONA supports three types of runtime models: YARN, K8s and Local. The local mode enable it easy to debug.
51-
52-
The SONA job is essentially a Spark Application with an associated Angel-PS application.
53-
After the job is successfully submitted, there will be two separate Applications on the cluster,
54-
one is the Spark Application and the other is the Angel-PS Application. The two Applications are not coupled.
55-
If the SONA job is deleted, users are required to kill both the Spark and Angel-PS Applications manually.
56-
57-
```bash
58-
#! /bin/bash
59-
- cd angel-<version>-bin/bin;
60-
- ./SONA-example
61-
```
62-
63-
The context of the submit scripts is as following:
64-
```bash
65-
#! /bin/bash
66-
source ./spark-on-angel-env.sh
67-
$SPARK_HOME/bin/spark-submit \
68-
--master yarn-cluster \
69-
--conf spark.ps.jars=$SONA_ANGEL_JARS \
70-
--conf spark.ps.instances=10 \
71-
--conf spark.ps.cores=2 \
72-
--conf spark.ps.memory=6g \
73-
--queue g_teg_angel-offline \
74-
--jars $SONA_SPARK_JARS \
75-
--name "BreezeSGD-spark-on-angel" \
76-
--driver-memory 10g \
77-
--num-executors 10 \
78-
--executor-cores 2 \
79-
--executor-memory 4g \
80-
--class com.tencent.angel.spark.examples.ml.BreezeSGD \
81-
./../lib/spark-on-angel-examples-${ANGEL_VERSION}.jar
82-
```
83-
84-
Users are encouraged to program instead of just using bash script. here is an example:
85-
```scala
86-
import com.tencent.angel.sona.core.DriverContext
87-
import org.apache.spark.angel.ml.classification.AngelClassifier
88-
import org.apache.spark.angel.ml.feature.LabeledPoint
89-
import org.apache.spark.angel.ml.linalg.Vectors
90-
import org.apache.spark.SparkConf
91-
import org.apache.spark.sql.{DataFrameReader, SparkSession}
92-
93-
val spark = SparkSession.builder()
94-
.master("local[2]")
95-
.appName("AngelClassification")
96-
.getOrCreate()
97-
98-
val libsvm = spark.read.format("libsvmex")
99-
val dummy = spark.read.format("dummy")
100-
101-
val trainData = libsvm.load("./data/angel/census/census_148d_train.libsvm")
102-
103-
val classifier = new AngelClassifier()
104-
.setModelJsonFile("./angelml/src/test/jsons/daw.json")
105-
.setNumClass(2)
106-
.setNumBatch(10)
107-
.setMaxIter(2)
108-
.setLearningRate(0.1)
109-
.setNumField(13)
110-
111-
val model = classifier.fit(trainData)
112-
113-
model.write.overwrite().save("trained_models/daw")
114-
```
115-
51+
[sona quick start](./docs/tutorials/sona_quick_start.md)
11652

11753
## Algorithms
11854
- machine learning algorithms:
@@ -124,6 +60,8 @@ model.write.overwrite().save("trained_models/daw")
12460
- [Robust Regression](docs/algo/robust_sona_en.md)
12561
- [Gradient Boosting Decision Tree](docs/GBDT.md)
12662
- [Hyper-Parameter Tuning](docs/AutoML.md)
63+
- [FTRL](docs/algo/ftrl_lr_sona_en.md)
64+
- [FTRL-FM](docs/algo/ftrl_fm_sona_en.md)
12765
+ Deep Learning Methods
12866
- [Deep Neural Network(DNN)](docs/algo/dnn_sona_en.md)
12967
- [Mix Logistic Regression(MLR)](docs/algo/mlr_sona_en.md)
@@ -147,4 +85,3 @@ model.write.overwrite().save("trained_models/daw")
14785
## References
14886

14987
## Other Resources
150-

‎angelml/bin/spark-on-angel-env.sh

+5-6
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,17 @@ export HADOOP_HOME=
1212
export SPARK_HOME=
1313
export SONA_HOME=
1414
export SONA_HDFS_HOME=
15-
export SONA_VERSION=0.1.0-SNAPSHOT
16-
export ANGEL_VERSION=3.0.0-SNAPSHOT
17-
export ANGEL_UTILS_VERSION=0.1.0-SNAPSHOT
15+
export SONA_VERSION=0.1.0
16+
export ANGEL_VERSION=3.0.0
17+
export ANGEL_UTILS_VERSION=0.1.1
1818

1919

2020
scala_jar=scala-library-2.11.8.jar
21-
angel_ps_external_jar=fastutil-7.1.0.jar,htrace-core-2.05.jar,sizeof-0.3.0.jar,kryo-shaded-4.0.0.jar,minlog-1.3.0.jar,memory-0.8.1.jar,commons-pool-1.6.jar,netty-all-4.1.1.Final.jar,hll-1.6.0.jar
21+
angel_ps_external_jar=fastutil-7.1.0.jar,htrace-core-2.05.jar,sizeof-0.3.0.jar,kryo-shaded-4.0.0.jar,minlog-1.3.0.jar,memory-0.8.1.jar,commons-pool-1.6.jar,netty-all-4.1.17.Final.jar,hll-1.6.0.jar
2222
angel_ps_jar=angel-format-${ANGEL_UTILS_VERSION}.jar,angel-mlcore-${ANGEL_UTILS_VERSION}.jar,angel-ps-core-${ANGEL_VERSION}.jar,angel-ps-mllib-${ANGEL_VERSION}.jar,angel-ps-psf-${ANGEL_VERSION}.jar,angel-math-${ANGEL_UTILS_VERSION}.jar,angel-ps-graph-${ANGEL_VERSION}.jar
2323

2424
sona_jar=core-${SONA_VERSION}.jar,angelml-${SONA_VERSION}.jar
25-
sona_external_jar=fastutil-7.1.0.jar,htrace-core-2.05.jar,sizeof-0.3.0.jar,kryo-shaded-4.0.0.jar,minlog-1.3.0.jar,memory-0.8.1.jar,commons-pool-1.6.jar,netty-all-4.1.1.Final.jar,hll-1.6.0.jar,jniloader-1.1.jar,native_system-java-1.1.jar,arpack_combined_all-0.1.jar,core-1.1.2.jar,netlib-native_ref-linux-armhf-1.1-natives.jar,netlib-native_ref-linux-i686-1.1-natives.jar,netlib-native_ref-linux-x86_64-1.1-natives.jar,netlib-native_system-linux-armhf-1.1-natives.jar,netlib-native_system-linux-i686-1.1-natives.jar,netlib-native_system-linux-x86_64-1.1-natives.jar,jettison-1.4.0.jar,json4s-native_2.11-3.2.11.jar
26-
#sona_external_jar=fastutil-7.1.0.jar,htrace-core-2.05.jar,sizeof-0.3.0.jar,kryo-shaded-4.0.0.jar,minlog-1.3.0.jar,memory-0.8.1.jar,commons-pool-1.6.jar,netty-all-4.1.1.Final.jar,hll-1.6.0.jar,json4s-jackson_2.11-3.4.2.jar
25+
sona_external_jar=fastutil-7.1.0.jar,htrace-core-2.05.jar,sizeof-0.3.0.jar,kryo-shaded-4.0.0.jar,minlog-1.3.0.jar,memory-0.8.1.jar,commons-pool-1.6.jar,netty-all-4.1.17.Final.jar,hll-1.6.0.jar,jniloader-1.1.jar,native_system-java-1.1.jar,arpack_combined_all-0.1.jar,core-1.1.2.jar,netlib-native_ref-linux-armhf-1.1-natives.jar,netlib-native_ref-linux-i686-1.1-natives.jar,netlib-native_ref-linux-x86_64-1.1-natives.jar,netlib-native_system-linux-armhf-1.1-natives.jar,netlib-native_system-linux-i686-1.1-natives.jar,netlib-native_system-linux-x86_64-1.1-natives.jar,jettison-1.4.0.jar,json4s-native_2.11-3.2.11.jar
2726

2827
dist_jar=${angel_ps_external_jar},${angel_ps_jar},${scala_jar},${sona_jar}
2928
local_jar=${sona_external_jar},${angel_ps_jar},${sona_jar}

‎angelml/pom.xml

+4-4
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<parent>
2222
<groupId>org.apache.spark.angel</groupId>
2323
<artifactId>sona</artifactId>
24-
<version>0.1.0-SNAPSHOT</version>
24+
<version>0.1.0</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

@@ -57,7 +57,7 @@
5757
<dependency>
5858
<groupId>com.tencent.angel</groupId>
5959
<artifactId>angel-format</artifactId>
60-
<version>0.1.0-SNAPSHOT</version>
60+
<version>0.1.1</version>
6161
<exclusions>
6262
<exclusion>
6363
<groupId>org.json4s</groupId>
@@ -76,7 +76,7 @@
7676
<dependency>
7777
<groupId>com.tencent.angel</groupId>
7878
<artifactId>angel-mlcore</artifactId>
79-
<version>0.1.0-SNAPSHOT</version>
79+
<version>0.1.1</version>
8080
<exclusions>
8181
<exclusion>
8282
<groupId>org.json4s</groupId>
@@ -91,7 +91,7 @@
9191
<dependency>
9292
<groupId>com.tencent.angel</groupId>
9393
<artifactId>angel-automl</artifactId>
94-
<version>0.1.0-SNAPSHOT</version>
94+
<version>0.1.0</version>
9595
</dependency>
9696
<dependency>
9797
<groupId>com.tencent.angel</groupId>

‎angelml/src/main/scala/org/apache/spark/angel/examples/AutoJsonRunnerExample.scala

-2
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,6 @@ object AutoJsonRunnerExample {
6767

6868
def main(args: Array[String]): Unit = {
6969

70-
val defaultInput = "hdfs://tl-nn-tdw.tencent-distribute.com:54310/user/tdw_rachelsun/joyjxu/angel-test/daw_data/census_148d_train.libsvm"
71-
val defaultOutput = "hdfs://tl-nn-tdw.tencent-distribute.com:54310/user/tdw_rachelsun/joyjxu/trained_models"
7270
val defaultJsonFile = "No json file parsed..."
7371
val defaultDataFormat = "org.apache.spark.ml.source.libsvm.LibSVMFileFormat"
7472

‎angelml/src/main/scala/org/apache/spark/angel/examples/JsonRunnerExamples.scala

+5-4
Original file line numberDiff line numberDiff line change
@@ -51,17 +51,18 @@ object JsonRunnerExamples {
5151
val dataFormat = params.getOrElse("dataFormat", "libsvm")//libsvm,dummy
5252
val actionType = params.getOrElse("actionType", "train")
5353
val jsonFile = params.getOrElse("jsonFile", "")
54-
val input = params.get("data").get
55-
val modelPath = params.get("modelPath").get
56-
val predict = params.get("predictPath").get
54+
val input = params.getOrElse("data", "")
55+
val modelPath = params.getOrElse("modelPath", "")
56+
val predict = params.getOrElse("predictPath", "")
5757
val numBatch = params.getOrElse("numBatch", "10").toInt
5858
val maxIter = params.getOrElse("maxIter", "2").toInt
5959
val lr = params.getOrElse("lr", "0.1").toFloat
6060
val numField = params.getOrElse("numField", "13").toInt
6161
val task = params.getOrElse("task", "classification")
62+
val master = params.getOrElse("master", "yarn-cluster")
6263

6364
val spark = SparkSession.builder()
64-
.master("yarn-cluster")
65+
.master(master)
6566
.appName("AngelClassification")
6667
.getOrCreate()
6768

‎angelml/src/main/scala/org/apache/spark/angel/examples/oneline_learning/FTRLExample.scala ‎angelml/src/main/scala/org/apache/spark/angel/examples/online_learning/FTRLExample.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* the License.
1515
*
1616
*/
17-
package org.apache.spark.angel.examples.oneline_learning
17+
package org.apache.spark.angel.examples.online_learning
1818

1919
import com.tencent.angel.conf.AngelConf
2020
import com.tencent.angel.ml.math2.utils.{LabeledData, RowType}

‎angelml/src/main/scala/org/apache/spark/angel/examples/oneline_learning/FtrlFMExample.scala ‎angelml/src/main/scala/org/apache/spark/angel/examples/online_learning/FtrlFMExample.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* the License.
1515
*
1616
*/
17-
package org.apache.spark.angel.examples.oneline_learning
17+
package org.apache.spark.angel.examples.online_learning
1818

1919
import com.tencent.angel.conf.AngelConf
2020
import com.tencent.angel.ml.math2.utils.RowType

‎core/pom.xml

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
<parent>
2626
<groupId>org.apache.spark.angel</groupId>
2727
<artifactId>sona</artifactId>
28-
<version>0.1.0-SNAPSHOT</version>
28+
<version>0.1.0</version>
2929
<relativePath>../pom.xml</relativePath>
3030
</parent>
3131

@@ -36,7 +36,7 @@
3636
<dependency>
3737
<groupId>com.tencent.angel</groupId>
3838
<artifactId>angel-format</artifactId>
39-
<version>0.1.0-SNAPSHOT</version>
39+
<version>0.1.1</version>
4040
<exclusions>
4141
<exclusion>
4242
<groupId>io.netty</groupId>
@@ -51,7 +51,7 @@
5151
<dependency>
5252
<groupId>com.tencent.angel</groupId>
5353
<artifactId>angel-mlcore</artifactId>
54-
<version>0.1.0-SNAPSHOT</version>
54+
<version>0.1.1</version>
5555
<exclusions>
5656
<exclusion>
5757
<groupId>org.json4s</groupId>

‎core/src/main/scala/com/tencent/angel/sona/util/ConfUtils.scala

+32-19
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
/*
2-
* Tencent is pleased to support the open source community by making Angel available.
3-
*
4-
* Copyright (C) 2017-2018 THL A29 Limited, a Tencent company. All rights reserved.
5-
*
6-
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in
7-
* compliance with the License. You may obtain a copy of the License at
8-
*
9-
* https://opensource.org/licenses/Apache-2.0
10-
*
11-
* Unless required by applicable law or agreed to in writing, software distributed under the License
12-
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
13-
* or implied. See the License for the specific language governing permissions and limitations under
14-
* the License.
15-
*
16-
*/
1+
/*
2+
* Tencent is pleased to support the open source community by making Angel available.
3+
*
4+
* Copyright (C) 2017-2018 THL A29 Limited, a Tencent company. All rights reserved.
5+
*
6+
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in
7+
* compliance with the License. You may obtain a copy of the License at
8+
*
9+
* https://opensource.org/licenses/Apache-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software distributed under the License
12+
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
13+
* or implied. See the License for the specific language governing permissions and limitations under
14+
* the License.
15+
*
16+
*/
1717
package com.tencent.angel.sona.util
1818

1919
import java.util.UUID
@@ -100,11 +100,12 @@ object ConfUtils extends CompatibleLogging {
100100
val appName = conf.get("spark.app.name") + "-ps"
101101
val queue = conf.get("spark.yarn.queue", "root.default")
102102

103-
/** mode: YARN or LOCAL */
103+
/** mode: YARN , KUBERNETES or LOCAL */
104104
val master = conf.getOption("spark.master")
105105
val isLocal = if (master.isEmpty || master.get.toLowerCase.startsWith("local")) true else false
106106
val deployMode = if (isLocal) "LOCAL" else conf.get("spark.ps.mode", DEFAULT_ANGEL_DEPLOY_MODE)
107107

108+
val masterMem = conf.getSizeAsGb("spark.angel.master.memory", "2g").toInt
108109
val psNum = conf.getInt("spark.ps.instances", 1)
109110
val psCores = conf.getInt("spark.ps.cores", 1)
110111
val psMem = conf.getSizeAsGb("spark.ps.memory", "4g").toInt
@@ -147,7 +148,13 @@ object ConfUtils extends CompatibleLogging {
147148
hadoopConf.set(ANGEL_ACTION_TYPE, "train")
148149
hadoopConf.set(ANGEL_SAVE_MODEL_PATH, tempPath)
149150

151+
if (deployMode == "KUBERNETES") {
152+
hadoopConf.set(ANGEL_KUBERNETES_MASTER, master.get.substring("k8s://".length))
153+
}
154+
150155
// Setting resource
156+
hadoopConf.setInt(ANGEL_AM_MEMORY_GB, masterMem)
157+
151158
hadoopConf.setInt(ANGEL_PS_NUMBER, psNum)
152159
hadoopConf.setInt(ANGEL_PS_CPU_VCORES, psCores)
153160
hadoopConf.setInt(ANGEL_PS_MEMORY_GB, psMem)
@@ -165,9 +172,15 @@ object ConfUtils extends CompatibleLogging {
165172
hadoopConf.setInt(ANGEL_PSAGENT_CACHE_SYNC_TIMEINTERVAL_MS, 100000000)
166173
hadoopConf.set(ANGEL_LOG_PATH, tempPath)
167174

168-
// add user resource files
169-
addUserResourceFiles(conf, hadoopConf)
175+
if (deployMode != "KUBERNETES") {
176+
// add user resource files
177+
addUserResourceFiles(conf, hadoopConf)
178+
}
170179

180+
// Some other settings
181+
conf.getAllWithPrefix("spark.angel").foreach {
182+
case (key, value) => hadoopConf.set(s"angel$key", value)
183+
}
171184
hadoopConf
172185
}
173186

‎dist/pom.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
<parent>
2525
<groupId>org.apache.spark.angel</groupId>
2626
<artifactId>sona</artifactId>
27-
<version>0.1.0-SNAPSHOT</version>
27+
<version>0.1.0</version>
2828
</parent>
2929

3030
<artifactId>sona-dist</artifactId>

‎docs/algo/ftrl_fm_sona_en.md

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Training Factorization Machine with FTRL on Spark on Angel
2+
3+
> FM(Factorization Machine) is an algorithm based on matrix decomposition which can predict any real-valued vector.
4+
5+
> Its main advantages include:
6+
7+
- can handle highly sparse data;
8+
- linear computational complexity
9+
10+
> FTRL (Follow-the-regularized-leader) is an optimization algorithm which is widely deployed by online learning. Employing FTRL is easy in Spark-on-Angel and you can train a model with billions, even ten billions, dimensions once you have enough machines.
11+
12+
Here, we will use FTRL Optimizer to update the parameters of FM.
13+
14+
If you are not familiar with how to programming on Spark-on-Angel, please first refer to [Programming Guide for Spark-on-Angel](https://github.com/Angel-ML/angel/blob/master/docs/programmers_guide/spark_on_angel_programing_guide_en.md);
15+
16+
## Factorization Model
17+
18+
![model](http://latex.codecogs.com/png.latex?\dpi{150}\hat{y}(x)=b+\sum_{i=1}^n{w_ix_i}+\sum_{i=1}^{n-1}\sum_{j=i+1}^n<v_i,v_j>x_ix_j)
19+
20+
where ![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20<v_i,v_j>) is the dot of two k-dimension vector:
21+
22+
![dot](http://latex.codecogs.com/png.latex?\dpi{150}\inline%20<v_i,v_j>=\sum_{i=1}^kv_{i,f}\cdot%20v_{j,f})
23+
24+
model parameters:
25+
![parameter](http://latex.codecogs.com/png.latex?\dpi{100}\inlinew_0\in%20w\in%20R^n,V\in%20R^{n\times%20k})
26+
, where n is the number of feature, ![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20v_i) represents feature i composed by k factors, k is a hyperparameter that determines the factorization.
27+
28+
29+
## Using the FTRL-FM
30+
31+
```scala
32+
33+
import com.tencent.angel.ml.math2.utils.RowType
34+
import org.apache.spark.angel.ml.online_learning.FtrlFM
35+
36+
// allocate a ftrl optimizer with (lambda1, lambda2, alpha, beta)
37+
val optim = new FtrlFM(lambda1, lambda2, alpha, beta)
38+
// initializing the model
39+
optim.init(dim, factor)
40+
```
41+
42+
There are four hyper-parameters for the FTRL optimizer, which are lambda1, lambda2, alpha and beta. We allocate a FTRL optimizer with these four hyper-parameters. The next step is to initialized a FtrlFM model. There are two matrixs for FtrlFM, including `first` and `second`, the `first` contains the z, n and w in which z and n are used to init or update parameter w in FM, the `second` contains the z, n and v in which z and n are used to init or update parameter v in FM. In the aboving code, we allocate `first` a sparse distributed matrix with 3 rows and dim columns, and allocate `second` a sparse distributed matrix with 3 * factor rows and dim columns.
43+
44+
### set the dimension
45+
In the scenaro of online learning, the index of features can be range from (int.min, int.max), which is usually generated by a hash function. In Spark-on-Angel, you can set the dim=-1 when your feature index range from (int.min, int.max) and rowType is sparse. If the feature index range from [0, n), you can set the dim=n.
46+
47+
48+
## Training with Spark
49+
50+
### loading data
51+
Using the interface of RDD to load data and parse them to vectors.
52+
53+
```scala
54+
val data = sc.textFile(input).repartition(partNum)
55+
.map(s => (DataLoader.parseIntFloat(s, dim), DataLoader.parseLabel(s, false)))
56+
.map {
57+
f =>
58+
f._1.setY(f._2)
59+
f._1
60+
}
61+
```
62+
### training model
63+
64+
```scala
65+
val size = data.count()
66+
for (epoch <- 1 to numEpoch) {
67+
val totalLoss = data.mapPartitions {
68+
case iterator =>
69+
// for each partition
70+
val loss = iterator
71+
.sliding(batchSize, batchSize)
72+
.zipWithIndex
73+
.map(f => optim.optimize(f._2, f_1.toArray)).sum
74+
Iterator.single(loss)
75+
}.sum()
76+
println(s"epoch=$epoch loss=${totalLoss / size}")
77+
}
78+
```
79+
80+
81+
### saving model
82+
83+
```scala
84+
output = "hdfs://xxx"
85+
optim.weight
86+
optim.save(output + "/back")
87+
optim.saveWeight(output)
88+
```
89+
90+
### Submit Command
91+
92+
```shell
93+
source ./bin/spark-on-angel-env.sh
94+
95+
$SPARK_HOME/bin/spark-submit \
96+
--master yarn-cluster \
97+
--conf spark.yarn.allocation.am.maxMemory=55g \
98+
--conf spark.yarn.allocation.executor.maxMemory=55g \
99+
--conf spark.driver.maxResultSize=20g \
100+
--conf spark.kryoserializer.buffer.max=2000m\
101+
--conf spark.ps.jars=$SONA_ANGEL_JARS \
102+
--conf spark.ps.instances=1 \
103+
--conf spark.ps.cores=2 \
104+
--conf spark.ps.memory=5g \
105+
--conf spark.ps.log.level=INFO \
106+
--conf spark.offline.evaluate=200\
107+
--jars $SONA_SPARK_JARS \
108+
--name "FTRLFM on Spark-on-Angel" \
109+
--driver-memory 5g \
110+
--num-executors 5 \
111+
--executor-cores 2 \
112+
--executor-memory 2g \
113+
--class org.apache.spark.angel.examples.online_learning.FtrlFMExample \
114+
./lib/angelml-${SONA_VERSION}.jar \
115+
input:$input modelPath:$model dim:$dim batchSize:$batchSize actionType:train factor:5
116+
```
117+
[detail parameters](../../angelml/src/main/scala/org/apache/spark/angel/examples/online_learning/FtrlFMExample.scala)

‎docs/algo/ftrl_lr_sona_en.md

+118
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Training Logistic Regression with FTRL on Spark on Angel
2+
3+
FTRL (Follow-the-regularized-leader) is an optimization algorithm which is widely deployed by online learning. Employing FTRL is easy in Spark-on-Angel and you can train a model with billions, even ten billions, dimensions once you have enough machines.
4+
5+
If you are not familiar with how to programming on Spark-on-Angel, please first refer to [Programming Guide for Spark-on-Angel](https://github.com/Angel-ML/angel/blob/master/docs/programmers_guide/spark_on_angel_programing_guide_en.md);
6+
7+
## FTRL Optimizer
8+
9+
The FTRL algorithm takes into account the advantages of both FOBOS and RDA algorithms. It not only guarantees high precision with FOBOS, but also produces better sparsity with loss of certain precision.
10+
The update formula for the feature weight of the algorithm (Reference 1) is:
11+
12+
![](../imgs/ftrl_lr_w.png)
13+
14+
where the ![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20G^{(1:t)}=\sum_{s=1}^t{G^{s}}) represents the gradient of loss function.
15+
16+
The update formula for the feature weight of the algorithm can be decomposed into N independent scalar minimization problems for each dimension of feature weight.
17+
18+
![](../imgs/ftrl_lr_w_update.png)
19+
20+
![](../imgs/ftrl_lr_d_t.png)
21+
22+
where the ![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20{z_i}) and ![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20{n_i}) are updated as follows:
23+
24+
![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20{z_i}^{(t)}={z_i}^{(t-1)}\+{g_i}^t-\(\frac{1}{{\eta_i}^{(t)}}\-\frac{1}{{\eta_i}^{(t-1)}}\){w_i}^{(t)})
25+
26+
![](http://latex.codecogs.com/png.latex?\dpi{100}\inline%20{n_i}^{(t)}={n_i}^{(t-1)}\+({g_i}^{(t)})^2)
27+
28+
29+
## Using the FTRL Optimizer
30+
31+
```scala
32+
33+
import com.tencent.angel.ml.math2.utils.RowType
34+
import org.apache.spark.angel.ml.online_learning.FTRL
35+
36+
// allocate a ftrl optimizer with (lambda1, lambda2, alpha, beta)
37+
val optim = new FTRL(lambda1, lambda2, alpha, beta)
38+
// initializing the model
39+
optim.init(dim)
40+
```
41+
42+
There are four hyper-parameters for the FTRL optimizer, which are lambda1, lambda2, alpha and beta. We allocate a FTRL optimizer with these four hyper-parameters. The next step is to initialized a FTRL model. There are three vectors for FTRL, including z, n and w. In the aboving code, we allocate a sparse distributed matrix with 3 rows and dim columns.
43+
44+
### set the dimension
45+
In the scenaro of online learning, the index of features can be range from (long.min, long.max), which is usually generated by a hash function. In Spark-on-Angel, you can set the dim=-1 when your feature index range from (long.min, long.max) and rowType is sparse. If the feature index range from [0, n), you can set the dim=n.
46+
47+
## Training with Spark
48+
49+
### loading data
50+
Using the interface of RDD to load data and parse them to vectors.
51+
52+
```scala
53+
val data = sc.textFile(input).repartition(partNum)
54+
.map(s => (DataLoader.parseLongDouble(s, dim), DataLoader.parseLabel(s, false)))
55+
.map {
56+
f =>
57+
f._1.setY(f._2)
58+
f._1
59+
}
60+
```
61+
### training model
62+
63+
```scala
64+
val size = data.count()
65+
for (epoch <- 1 to numEpoch) {
66+
val totalLoss = data.mapPartitions {
67+
case iterator =>
68+
// for each partition
69+
val loss = iterator
70+
.sliding(batchSize, batchSize)
71+
.map(f => optim.optimize(f.toArray)).sum
72+
Iterator.single(loss)
73+
}.sum()
74+
println(s"epoch=$epoch loss=${totalLoss / size}")
75+
}
76+
```
77+
78+
79+
### saving model
80+
81+
```scala
82+
output = "hdfs://xxx"
83+
optim.weight
84+
optim.saveWeight(output)
85+
optim.save(output + "/back")
86+
```
87+
88+
### Submit Command
89+
90+
```shell
91+
source ./bin/spark-on-angel-env.sh
92+
93+
$SPARK_HOME/bin/spark-submit \
94+
--master yarn-cluster \
95+
--conf spark.yarn.allocation.am.maxMemory=55g \
96+
--conf spark.yarn.allocation.executor.maxMemory=55g \
97+
--conf spark.driver.maxResultSize=20g \
98+
--conf spark.kryoserializer.buffer.max=2000m\
99+
--conf spark.ps.jars=$SONA_ANGEL_JARS \
100+
--conf spark.ps.instances=1 \
101+
--conf spark.ps.cores=2 \
102+
--conf spark.ps.memory=5g \
103+
--conf spark.ps.log.level=INFO \
104+
--conf spark.offline.evaluate=200\
105+
--jars $SONA_SPARK_JARS \
106+
--name "FTRL on Spark-on-Angel" \
107+
--driver-memory 5g \
108+
--num-executors 5 \
109+
--executor-cores 2 \
110+
--executor-memory 2g \
111+
--class org.apache.spark.angel.examples.online_learning.FTRLExample \
112+
./lib/angelml-${SONA_VERSION}.jar \
113+
input:$input modelPath:$model dim:$dim batchSize:$batchSize actionType:train
114+
```
115+
[detail parameters](../../angelml/src/main/scala/org/apache/spark/angel/examples/online_learning/FTRLExample.scala)
116+
117+
##References
118+
1. H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young. Ad Click Prediction: a View from the Trenches.KDD’13, August 11–14, 2013

‎docs/line_en.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,6 @@ $SPARK_HOME/bin/spark-submit \
9393
--executor-cores 4 \
9494
--executor-memory 10g \
9595
--class org.apache.spark.angel.examples.graph.LINEExample2 \
96-
./lib/angelml-0.1.0-SNAPSHOT.jar
96+
./lib/angelml-0.1.0.jar
9797
input:$input output:$output embedding:128 negative:5 epoch:100 stepSize:0.01 batchSize:1000 numParts:2 subSample:false remapping:false order:2 interval:5
9898
```

‎docs/tutorials/sona_quick_start.md

+141
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# SONA(Spark on Angel) Quick Start
2+
3+
he SONA job is essentially a Spark Application with an associated Angel-PS application.
4+
After the job is successfully submitted, there will be two separate Applications on the cluster,
5+
one is the Spark Application and the other is the Angel-PS Application. The two Applications are not coupled.
6+
If the SONA job is deleted, users are required to kill both the Spark and Angel-PS Applications manually.
7+
8+
## Deployment Process
9+
10+
1. **Install Spark**
11+
2. **Install SONA**
12+
1. unzip sona-\<version\>-bin.zip
13+
2. set these environmental variables: `SPARK_HOME`, `SONA_HOME`, `SONA_HDFS_HOME` in sona-\<version\>-bin/bin/spark-on-angl-env.sh
14+
3. put the extracted folder `sona-\<version\>-bin` into `SONA_HDFS_HOME`
15+
16+
3. Configure Environment Variables
17+
18+
- need to import environment script:source ./spark-on-angel-env.sh
19+
- configure the Jar package location:spark.ps.jars=\$SONA_ANGEL_JARS和--jars \$SONA_SPARK_JARS
20+
21+
## Submit Task
22+
23+
After completing sona program coding, then package it. At last, use `spark-submit` script to submit the task.
24+
25+
26+
## Run Example(Logistic Regression)
27+
28+
```bash
29+
#! /bin/bash
30+
- cd sona-<version>-bin/bin;
31+
- ./SONA-example
32+
```
33+
34+
script as follows:
35+
36+
```bash
37+
#!/bin/bash
38+
39+
source ./spark-on-angel-env.sh
40+
41+
$SPARK_HOME/bin/spark-submit \
42+
--master yarn-cluster \
43+
--conf spark.ps.jars=$SONA_ANGEL_JARS \
44+
--conf spark.ps.instances=10 \
45+
--conf spark.ps.cores=2 \
46+
--conf spark.ps.memory=6g \
47+
--jars $SONA_SPARK_JARS\
48+
--name "LR-spark-on-angel" \
49+
--files <logreg.json path> \
50+
--driver-memory 10g \
51+
--num-executors 10 \
52+
--executor-cores 2 \
53+
--executor-memory 4g \
54+
--class org.apache.spark.angel.examples.JsonRunnerExamples \
55+
./../lib/angelml-${SONA_VERSION}.jar \
56+
data:<input_path> \
57+
modelPath:<output_path> \
58+
jsonFile:./logreg.json \
59+
lr:0.1
60+
```
61+
62+
> Attention: the parameters of Angel PS need to be set:`spark.ps.instance``spark.ps.cores``spark.ps.memory`
63+
> ```--files <logreg.json path>``` using this parameter to upload your local json file, here ```<logreg.json path>```is the local path of json(such as: xx/xx/logreg.json)
64+
> ```jsonFile:./logreg.json \``` this parameter is using the json you upload
65+
> resources such as: executor, driver, ps, depend on your dataset
66+
67+
68+
## LR Json Example
69+
70+
- [detail json](https://github.com/Angel-ML/angel/blob/master/docs/basic/json_conf_en.md)
71+
- [data](https://github.com/Angel-ML/angel/tree/master/data/a9a/a9a_123d_train.libsvm)
72+
73+
```json
74+
{
75+
"data": {
76+
"format": "libsvm",
77+
"indexrange": 123,
78+
"validateratio": 0.1,
79+
"sampleratio": 1.0
80+
},
81+
"train": {
82+
"epoch": 10,
83+
"lr": 0.5
84+
},
85+
"model": {
86+
"modeltype": "T_DENSE_SPARSE"
87+
},
88+
"default_optimizer": {
89+
"type": "momentum",
90+
"momentum": 0.9,
91+
"reg2": 0.001
92+
},
93+
"layers": [
94+
{
95+
"name": "wide",
96+
"type": "simpleinputlayer",
97+
"outputdim": 1,
98+
"transfunc": "identity"
99+
},
100+
{
101+
"name": "simplelosslayer",
102+
"type": "simplelosslayer",
103+
"lossfunc": "logloss",
104+
"inputlayer": "wide"
105+
}
106+
]
107+
}
108+
109+
```
110+
111+
Users are encouraged to program instead of just using bash script. here is an example:
112+
113+
```scala
114+
import com.tencent.angel.sona.core.DriverContext
115+
import org.apache.spark.angel.ml.classification.AngelClassifier
116+
import org.apache.spark.angel.ml.feature.LabeledPoint
117+
import org.apache.spark.angel.ml.linalg.Vectors
118+
import org.apache.spark.SparkConf
119+
import org.apache.spark.sql.{DataFrameReader, SparkSession}
120+
121+
val spark = SparkSession.builder()
122+
.master("local[2]")
123+
.appName("AngelClassification")
124+
.getOrCreate()
125+
126+
val libsvm = spark.read.format("libsvmex")
127+
val dummy = spark.read.format("dummy")
128+
129+
val trainData = libsvm.load("./data/angel/a9a/a9a_123d_train.libsvm")
130+
131+
val classifier = new AngelClassifier()
132+
.setModelJsonFile("./angelml/src/test/jsons/logreg.json")
133+
.setNumClass(2)
134+
.setNumBatch(10)
135+
.setMaxIter(2)
136+
.setLearningRate(0.1)
137+
138+
val model = classifier.fit(trainData)
139+
140+
model.write.overwrite().save("trained_models/lr")
141+
```

‎examples/pom.xml

-225
This file was deleted.

‎examples/src/main/scala/org/apache/spark/examples/JsonRunner.scala

-98
This file was deleted.

‎pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<modelVersion>4.0.0</modelVersion>
2222
<groupId>org.apache.spark.angel</groupId>
2323
<artifactId>sona</artifactId>
24-
<version>0.1.0-SNAPSHOT</version>
24+
<version>0.1.0</version>
2525
<packaging>pom</packaging>
2626
<name>Spark On Angel Project Parent POM</name>
2727
<url>http://spark.apache.org/</url>
@@ -36,7 +36,7 @@
3636
<scala.binary.version>2.11</scala.binary.version>
3737
<scala.version>2.11.8</scala.version>
3838
<spark.version>2.3.0</spark.version>
39-
<angel.version>3.0.0-SNAPSHOT</angel.version>
39+
<angel.version>3.0.0</angel.version>
4040
<breeze.version>0.13</breeze.version>
4141
<!-- Modules that copy jars to the build directory should do so under this location. -->
4242
<jars.target.dir>${project.build.directory}/scala-${scala.binary.version}/jars</jars.target.dir>

0 commit comments

Comments
 (0)
Please sign in to comment.