Skip to content

Commit

Permalink
add developers guide
Browse files Browse the repository at this point in the history
  • Loading branch information
Han Wang committed Jan 22, 2022
1 parent 004ea37 commit a4eddf0
Show file tree
Hide file tree
Showing 5 changed files with 154 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
DPGEN2 is the 2nd generation of the Deep Potential GENerator.

For developers please read the [developers guide](docs/developer.md)

81 changes: 81 additions & 0 deletions docs/developer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Developers' guide

- [The concurrent learning algorithm](#the-concurrent-learning-algorithm)
- [Overview of the DPGEN2 implementation](#overview-of-the-dpgen2-implementation)
- [The DPGEN2 workflow](#the-dpgen2-workflow)
- [How to contribute](#how-to-contribute)

## The concurrent learning algorithm

DPGEN2 implements the concurrent learning algorithm named DP-GEN, decribed in [this paper](https://doi.org/10.1016/j.cpc.2020.107206). It is noted that other types of workflows, like the active learning, should be easily implemented within the infrasturcture of DPGEN2.

The DP-GEN algorithm is an iterative algorithm. In each iteration, four steps are consecutively executed: training, exploration, selection and labeling.

1. **Training**. A set of DP models are training with the same dataset and the same hyperparameters. The only difference is the random seed initializing the model parameters.
2. **Exploration**. One of the DP models is used to explore the configuration space. The strategy of exploration highly depends on the purpose of the application case of the model. The simulation technqiue for exploration can be molecular dynamics, Monte Carlo, structure search/optimization, enhanced sampling, or any combination of them. Current DPGEN2 only supports exploration based on molecular simulation platform [LAMMPS](https://www.lammps.org/).
3. **Selection**. Not all the explored configurations are labeled, rather, the model prediction errors on the configurations are estimated by the ***model deviation***, which is defined as the standard deviation in predictions of the set of the models. The the critical configurations with large and not-that-large errors are selected for labeling. The configurations with very large errors are not selected because the large error is usually caused by non-physical configurations, e.g. overlapping atoms.
4. **Labeling**. The selected configurations are labed with energy, forces and virial calculated by a method of first-principles accuracy. The usually used method is the [density functional theory](https://doi.org/10.1103/PhysRev.140.A1133) implemented in [VASP](https://www.vasp.at/), [Quantum Expresso](https://www.quantum-espresso.org/), [CP2K](https://www.cp2k.org/), and etc.. The labeled data are finally added to the training dataset to start the next iteration.

In each iteration, the quality of the model is improved by selecting and labeling more critical data, and adding them to the training dataset. The DP-GEN iteration is converged when no more critical data can be selected.

## Overview of the DPGEN2 Implementation

The implementation DPGEN2 is based on the workflow platform [dflow](https://github.com/dptech-corp/dflow), which is a python wrapper of the [Argo Workflows](https://argoproj.github.io/workflows/), an open source container-native workflow engine on [Kubernetes](https://kubernetes.io/).

The DP-GEN algorithm is conceptually modeled as a computational graph. The implementation is then considered as two lines: the operators and the workflow.
1. **Operators**. Operators are implemented in Python v3. The operators should be implemented and tested ***without*** the workflow.
2. **Workflow**. Workflow is implemented on [dflow](https://github.com/dptech-corp/dflow). Ideally the workflow is implemented and tested with all operators mocked.


## The DPGEN2 workflow

The workflow of DPGEN2 is illustrated in the following figure

![dpgen flowchart](./figs/dpgen-flowchart.jpg)

In the central is the `block` operator, which is a super-OP for one DP-GEN iteration, i.e. the super-OP of the training, exploration, selection and labeling steps. The inputs of the `block` OP are `lmp_task_group`, `conf_selector` and `dataset`.
- `lmp_task_group`: definition of a group of LAMMPS tasks that explore the configuration space.
- `conf_selector`: defines the rule by which the configurations are selected for labeling.
- `dataset`: the training dataset.

The outputs of the `block` OP are
- `exploration_report`: a report recording the result of the exploration.
- `dataset_incr`: the increment of the training dataset.

The `dataset_incr` is added to the training `dataset`.

The `exploration_report` is passed to the `exploration_strategy` OP. The `exploration_strategy` implements the strategy of exploration. It reads the `exploration_report` generated by each iteration (`block` OP), then tells if the iteration is converged. If not, it generates a group of LAMMPS tasks (`lmp_task_group`), and the critiera of selecting configurations (`conf_selector`). The `lmp_task_group` and `conf_selector` are then used by `block` of the next iteration. The iteration closes.

### Inside the `block` operator

The inside of the super-OP `block` is displayed on the right-hand-side of the figure. It contains
- `prep_run_dp_train`: preprares training tasks of DP models and runs them.
- `prep_run_lmp`: prepares the LAMMPS exploration tasks and runs them.
- `select_confs`: selects configurations for labeling from the explored configurations.
- `prep_run_fp`: prepares and runs first-principles tasks.
- `collect_data`: collects the `dataset_incr` and adds it to `dataset`.


### The exploration strategy

The exploration strategy defines how the configuration space is explored by the concurrent learning algorithm. The design of the exploration strategy is graphically illustated in the following figure. The exploration is composed by stages. Only the DP-GEN exploration is converged at one stage (no configurations with large error is explored), the exploration goes to the next iteration. The whole procedure is controlled by `exploration_scheduler`. Each stage has its own schedule, which talks to the `exploration_scheduler` to generate the schedule for the DP-GEN algorithm.

![exploration strategy](./figs/exploration-strategy.jpg)

Some concept are explained below:

- **Exploration group**. A group of LAMMPS tasks share similar settings. For example, a group of NPT MD simulation in a certain thermodynamic space.
- **Exploration stage**. The `exploration_stage` contains a list of exploration groups. It contains all information needed to define the `lmp_task_group` used by the `block` in the DP-GEN iteration.
- **Stage scheduler**. It garantees the convergence of DP-GEN algorithm in each `exploration_stage`. If the exploration is not converged, the `stage_scheduler` generates `lmp_task_group` and `conf_selector` from the `exploration_stage` for the next iteration (probabily with different initial condition, i.e. different initial configurtions and randomly generated initial velocity).
- **Exploration scheduler**. The scheduler for the DP-GEN algorithm. When DP-GEN is converged in one of the stages, it goes to the next stage until all planed stages are used.


## How to contribute

Any one who is interested in the DPGEN2 project may contribute from two aspects: operatorsand workflows.

One may check the [guide on writting operators](./operator.md)

The DP-GEN workflow is implemented in [dpgen2/flow/loop.py](https://github.com/wanghan-iapcm/dpgen2/blob/master/dpgen2/flow/loop.py) and tested with all opeartors mocked in [test/test_loop.py](https://github.com/wanghan-iapcm/dpgen2/blob/master/tests/test_loop.py)

The sub-workflow in `block` is implemented in [dpgen2/flow/block.py](https://github.com/wanghan-iapcm/dpgen2/blob/master/dpgen2/flow/block.py) and tested with all opeartors mocked in [tests/test_block_cl.py](https://github.com/wanghan-iapcm/dpgen2/blob/master/tests/test_block_cl.py)
Binary file added docs/figs/dpgen-flowchart.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figs/exploration-strategy.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions docs/operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@

# Operators

The operators are building blocks of the workflow.

DPGEN2 implements the OPs in Python. All OPs are derived from the base class `dflow.OP`. An example `OP` `CollectData` is provided as follows.

```python
from dflow.python import (
OP,
OPIO,
OPIOSign,
Artifact
)

class CollectData(OP):
@classmethod
def get_input_sign(cls):
return OPIOSign({
"name" : str,
"labeled_data" : Artifact(List[Path]),
"iter_data" : Artifact(Set[Path]),
})

@classmethod
def get_output_sign(cls):
return OPIOSign({
"iter_data" : Artifact(Set[Path]),
})

@OP.exec_sign_check
def execute(
self,
ip : OPIO,
) -> OPIO:
name = ip['name']
labeled_data = ip['labeled_data']
iter_data = ip['iter_data']

## do works to generate new_iter_data
...
## done

return OPIO({
"iter_data" : new_iter_data,
})
```

The `dflow` requires static type define, i.e. the signitures of an OP, for the input and output variables. The input and output signitures of the `OP` are given by `classmethods` `get_input_sign` and `get_output_sign`.

The operator is executed by method `OP.executed`. The inputs and outputs variables are recorded in `dict`s. The keys in the input/output `dict`, and the types of the input/output variables will be checked against their signitures by decorator `OP.exec_sign_check`. If any key or type does not match, exception will be raised.

The python `OP`s will be wrapped to `dflow` operators (named `Step`) to construct the workflow. An example of wrapping is
```python
collect_data = Step(
name = "collect-data"
template=PythonOPTemplate(
CollectData,
image="dflow:v1.0",
),
parameters={
"name": foo.inputs.parameters["name"],
},
artifacts={
"iter_data" : foo.inputs.artifacts['iter_data'],
"labeled_data" : bar.outputs.artifacts['labeled_data'],
},
)
```

0 comments on commit a4eddf0

Please sign in to comment.