Skip to content

Commit f80f523

Browse files
committed
[DOCS] Improve psFunc & Architecture (#95)
1 parent 0ec78ab commit f80f523

File tree

4 files changed

+32
-21
lines changed

4 files changed

+32
-21
lines changed

docs/design/psfFunc_en.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# psFunc (Parameter Server Func)
22

3-
Normally, a standard parameter server provides the basic **pull** and **push** functions. In practice, though, it is not as simple as using these basic functions to realize parameter requesting and updating for each specific algorithm, especially when the algorithm needs to implement some specific optimization mechanisms.
3+
Normally, a standard parameter server provides the basic **pull** and **push** functions. But in practice it is not that simple, especially when the algorithm needs to implement some specific optimization mechanisms.
44

5-
> For example, in some situations, an algorithm needs to get the maximum value of a row in the model matrix. If the PS system has only the basic pull interface, PSClient has to pull all the columns of this row back from the PS, and the worker calculates the maximum --- this approach generates a lot of network communication cost that will affect the performance. If we have a customized function, on the other hand, each PSServer can calculate a number of local maximums and exchange these values to decide on the overall maximum, returning a single value --- this approach reduces the communication overhead drastically while keeping the computation cost at the same level.
5+
> For example, in some situations, an algorithm needs to get the maximum value of a row in the `model matrix`. If the PS system has only the basic pull interface, PSClient has to pull all the columns of this row back from the PS, and the worker calculates the maximum --- this approach generates a lot of network communication cost that will affect the performance. If we have a customized function, on the other hand, each PSServer can calculate a number of local maximums and exchange these values to decide on the overall maximum, returning a single value --- this approach reduces the communication overhead drastically while keeping the computation cost at the same level.
66
7-
In order to solve these problems, Angel introduces and implements psFunc, which encapsulates and abstractizes the process of requesting and updating the remote model. psFunc is one type of user-defined functions (UDF) yet closely related to the PS operations (thus its name **psFunc**, abbreviated as **psf**). The overall architecture is shown below:
7+
In order to solve these problems, Angel introduces and implements psFunc, which encapsulates and abstracts the process of requesting and updating the remote model. psFunc is one type of user-defined functions (UDF) yet closely related to the PS operations (thus its name **psFunc**, abbreviated as **psf**). The overall architecture is shown below:
88

99
![](../img/angel_psFunc.png)
1010

1111
> With the introduction of psFunc, some concrete computations can happen on the PSServer side; in another word, PSServer will no longer merely store the model, but also actually be in charge of model computations to certain extent. A sensible design of psFunc can significantly accelerate the algorithm's execution.
1212
13-
It is worth noting that in many complex algorithm implementations, the introduction and strengthening of psFunc has greatly decreased the need of workers pulling back the entire model for overall calculations, resulting in a successful detour to **model parallelization**.
13+
It is worth mentioning that in many complex algorithm implementations, the introduction and strengthen of psFunc has greatly decreased the need of workers pulling back the entire model for overall calculations, resulting in a successful detour to **model parallelization**.
1414

1515
We introduce psFunc in the following modules:
1616

docs/overview/architecture.md

+9-5
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,13 @@
55
![][1]
66

77
Angel的整体设计比较简约,层次鲜明,容易上手,没有过多复杂的设计,关注模型和机器学习相关特性,追求高维度模型下的最佳性能。它的架构设计,从整体可以分为3大模块:
8-
1. **Parameter Server层**:提供通用的`参数服务器`服务,负责模型的分布存储,通讯同步和协调计算,并通过PSAgent提供`PS Service`2. **Worker层**: 基于Angel自身模型设计的分布式运行节点,自动读取并划分数据,局部训练出模型增量,通过`PS Client``PS Server`通信,完成模型训练和预测。一个Worker包含一个或者多个Task,Task是Angel计算单元,这样设计的原因是可以让Task共享Worker的许多公共资源。
8+
9+
1. **Parameter Server层**:提供通用的`参数服务器`服务,负责模型的分布存储,通讯同步和协调计算,并通过PSAgent提供`PS Service`
10+
11+
2. **Worker层**: 基于Angel自身模型设计的分布式运行节点,自动读取并划分数据,局部训练出模型增量,通过`PS Client``PS Server`通信,完成模型训练和预测。一个Worker包含一个或者多个Task,Task是Angel计算单元,这样设计的原因是可以让Task共享Worker的许多公共资源。
12+
913
3. **Model层**: 这是一层虚拟抽象层,并非真实存在的物理层。关于Model的Push和Pull,各种异步控制,模型分区路由,自定义函数……是连通Worker和PSServer的桥梁。
10-
14+
1115
除了这3大模块,还有2个很重要的类,在图上没有显示,但是值得关注,它们是:
1216

1317
1. **Client**:Angel任务运行的发起者
@@ -27,10 +31,10 @@ Angel的整体设计比较简约,层次鲜明,容易上手,没有过多复
2731

2832
通过如上的设计,Angel的整体架构,有着相对良好的可扩展性
2933

30-
* **PSServer层:** 既能通过PS-Service,提供灵活的多框架PS支持
34+
* **PSServer层:** 通过PS-Service,提供灵活的多框架PS支持
3135
* **Model层:** 提供PS必备的功能,并支持对性能进行针对性优化
32-
* **Worker层:** 满足Angel基于自主API进行算法开发和创新的需求,达到最佳的算法效果
36+
* **Worker层:** 能基于Angel自主API,进行算法开发和创新的需求
3337

3438
因此,分布式计算工程师,可以对核心层进行各种优化;而算法工程师和数据科学家,则可以充分复用这些成果,致力于各种学术界算法技巧的实现,达到最佳的性能和最好的准确率。
3539

36-
[1]: ../img/angel_architecture_1.png
40+
[1]: ../img/angel_architecture_1.png

docs/overview/architecture_en.md

+17-11
Original file line numberDiff line numberDiff line change
@@ -4,33 +4,39 @@
44

55
![][1]
66

7-
The overall design of Angel is simple, clear-cut, easy-to-use, without excessive complications; it focuses on characteristics related to machine learning models and pursues the best performance of high-dimensional models. Angel's architecture design consists of three modules:
8-
1. **Parameter Server Layer** provides a common `Parameter Server` (PS) service, responsible for distributed model storage, communication synchronization and coordination of computing, also provide `PS Service` through PSAgent.2. **Worker Layer** consists of distributed compute nodes designed based on the Angel model, which automatically read and partition data, compute the model updates locally, communicate with the `PS Server` through the `PS Client`, and complete the model training and prediction generation. One worker contains one or more tasks, where a task is a computing unit in Angel; designed this way, the tasks are able to share many public resources that a worker has access to.3. **Model Layer** is an abstract, virtue layer without physical components, which hosts functionalities such as model pull/push operations, multiple sync protocols, model partitioner, psFunc, among others; this layer bridges between the worker layer and the PS layer.
9-
7+
The overall design of Angel is simple, clear-cut, easy-to-use, without excessive complex design. It focuses on characteristics related to machine learning and models, pursuing best performance of high-dimensional models. In
8+
general, Angel's architecture design consists of three modules:
9+
10+
1. **Parameter Server Layer**: provides a common `Parameter Server` (PS) service, responsible for distributed model storage, communication synchronization and coordination of computing, also provide `PS Service` through PSAgent.
11+
12+
2. **Worker Layer**: consists of distributed compute nodes designed based on the Angel model, which automatically read and partition data and compute model delta locally. It communicate with the `PS Server` through the `PS Client` to complete the model training and prediction process. One worker may contain one or more tasks, where as a task is a computing unit in Angel. In this way, tasks are able to share public resources of a worker.
13+
14+
3. **Model Layer**: is an abstract, virtue layer without physical components, which hosts functionalities such as model pull/push operations, multiple sync protocols, model partitioner, psFunc etc. This layer bridges between the worker layer and the PS layer.
15+
1016
In addition to the three modules, there are two important classes that deserve attention, though not shown in the chart:
1117

1218
1. **Client**: The Initiator of Angel Application
1319

1420
* Start/stop PSServer
1521
* Start/stop Angel worker
16-
* Load/save the model
17-
* Start the specific computating process
22+
* Load/save Model
23+
* Start the specific computing process
1824
* Obtain application status
1925

2026

2127
2. **Master**: The Guardian of Angel Application
2228

23-
* Slice and distribute raw data and the parameter matrix
24-
* Request computing resources for worker and parameter server from Gaia
29+
* Slice and distribute raw data and the parameter matrix
30+
* Request computing resources for worker and parameter server from Yarn
2531
* Coordinate, manage and monitor worker and PSServer activities
2632

27-
With the above design, Angel's overall architecture is comparatively easy to scale up.
33+
With the above design, Angel's overall architecture is comparatively easy to scale up.
2834

29-
* **PSServer Layer**: provide flexible, multi-framework PS support through PS-Service
35+
* **PSServer Layer**: provide flexible and multi-framework PS support through PS-Service
3036
* **Model Layer**: provide necessary functions of PS and support targeted optimization for performance
31-
* **Worker Layer**: satisfy needs for algorithm development and innovation based on independent API
37+
* **Worker Layer**: satisfy needs for algorithm development and innovation based on Angel's independent API
3238

33-
Therefore, the distributed computing engineers can focus on optimization of the core layers, whereas the algorithm engineers and data scientists can focus on the algorithm development and implementation, making full use of the platform, to pursue the best performance and accuracy.
39+
Therefore, with Angel,distributed computing engineers can focus on optimization of the core layers, whereas algorithm engineers and data scientists can focus on algorithm development and implementation, making full use of the platform, to pursue best performance and accuracy.
3440

3541

3642
[1]: ../img/angel_architecture_1.png

docs/overview/code_framework.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,9 @@ Angel是面向机器学习的,所以关于机器学习的相关元素,也是
3030

3131
包括了
3232

33-
* Martrix
33+
* Matrix
3434
* Vector
35+
* Feature
3536
* Optimizer
3637
* Objective
3738
* Metric

0 commit comments

Comments
 (0)