Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dvc pipelines #14

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2ab395d
Added dvc to repo and tiler stage
radistoubalidis Jun 22, 2022
fd95c6e
Added untracked files
radistoubalidis Jun 22, 2022
c6dd4c9
add reclner2gcbm stage
radistoubalidis Jun 22, 2022
c032a07
added add_species_vol_to_bio stage
radistoubalidis Jun 22, 2022
bab3896
added modify_root_parameters stage
radistoubalidis Jun 22, 2022
6537a28
added modify_turnover_parameters stage
radistoubalidis Jun 22, 2022
1015897
added modify_decay_parameters stage
radistoubalidis Jun 22, 2022
7758687
added modify_spinup_parameters stage
radistoubalidis Jun 22, 2022
5929812
added update_GCBM_configuratrion stage
radistoubalidis Jun 22, 2022
351cd31
added run_gcbm stage
radistoubalidis Jun 22, 2022
4bbfe3c
added create_tiffs and compile_results stages
radistoubalidis Jun 23, 2022
bdadf23
stop tracking Standalone_GCBM\logs\tiler_log.txt
radistoubalidis Jun 23, 2022
da83da7
stop tracking Standalone_GCBM\logs\create_tiffs.log
radistoubalidis Jun 23, 2022
c53ef52
stop tracking Standalone_GCBM\logs\update_gcbm_config.log
radistoubalidis Jun 23, 2022
0ec2f53
stop tracking Standalone_GCBM\logs\Moja_Debug.log
radistoubalidis Jun 23, 2022
807c70d
added dvc repo on gdrive to track log files
radistoubalidis Jun 27, 2022
adc8f34
stoped tracking processed_output/spatial
radistoubalidis Jun 28, 2022
4b24f60
added spatial output to remote storage
radistoubalidis Jun 28, 2022
d030f2d
updates dependencies in compile_results,run_gcbm stages
radistoubalidis Jun 29, 2022
9ddf387
updates dependencies in multiple stages
radistoubalidis Jun 29, 2022
5ff3db1
adds post_processing stage in pipeline
radistoubalidis Jun 30, 2022
6dc2a69
updates deps in post_processing stage
radistoubalidis Jun 30, 2022
e2aa8f4
updates multiple stages dependencies
radistoubalidis Jun 30, 2022
d2a83d6
adds script that calculates mean values for each indicator for each L…
radistoubalidis Jul 1, 2022
2195c6d
updates json format of metrics
radistoubalidis Jul 1, 2022
1a782cb
adds overview for ArchiveIndex_Beta_Install.mdb
radistoubalidis Jul 4, 2022
f6a9087
updates overview.ipynb
radistoubalidis Jul 4, 2022
ef35803
adds dvc pipeline readme
radistoubalidis Jul 4, 2022
87ed4ee
updates dvcpipeline readme
radistoubalidis Jul 4, 2022
e866830
updates dvc pipeline readme
radistoubalidis Jul 4, 2022
3ad7d94
updates dvc pipeline readme
radistoubalidis Jul 4, 2022
cb38bcd
adds export to csv option in query
radistoubalidis Jul 5, 2022
237d2a5
declares and uses python path as var in dvc.yml
radistoubalidis Jul 5, 2022
7944bde
updates pipeline readme
radistoubalidis Jul 5, 2022
b1fa0de
updates analyze.py to calculate metrics for every 50 years and adds i…
radistoubalidis Jul 5, 2022
5296555
refactor python paths in bat files back to original
radistoubalidis Jul 5, 2022
6c11636
modifies analyze.py to update metrics in post_processing stage
radistoubalidis Jul 7, 2022
3657937
casts json values as floats so dvc can recognize them
radistoubalidis Jul 7, 2022
7775c63
implemented clean run
radistoubalidis Jul 12, 2022
fd9b236
sets logging to write in file on modify scripts for dvc pipeline stag…
radistoubalidis Jul 14, 2022
347aec5
update pipeline.md to current changes
radistoubalidis Jul 15, 2022
acb98ba
update link to video demonstration on pipeline.md
radistoubalidis Sep 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update pipeline.md to current changes
Signed-off-by: radis toubalidis <rtoumpalidis@gmail.com>
radistoubalidis committed Jul 15, 2022
commit 347aec52652acd99bb0f13cac4365218470b5d32
51 changes: 43 additions & 8 deletions pipeline.md
Original file line number Diff line number Diff line change
@@ -3,12 +3,15 @@


Reminder :
- First you need to install [dvc](https://dvc.org/doc/install/windows)
- Setup your default remote storage [(help here)](https://dvc.org/doc/command-reference/remote/add)
- In `dvc.yaml` line 2 refactor :`python.path:<your_local_python37_path>`
- In every stage that has an outputs field dvc tracks the included files (i.e. `logs\tiler_log.txt` in the `tiler` stage) so after the completion of a stage that has outputs run `dvc push` to store the files in your deafult remote
- After you installed the requirements from [README](https://github.com/radistoubalidis/GCBM.Belize/blob/master/README.md "README" ) you can test the pipeline with :
- First you need to install [dvc](https://dvc.org/doc/install/windows) along with the driver for the remote storage you wish to upload your files (e.g. google drive)
- Setup your default remote storage [(help here)](https://dvc.org/doc/command-reference/remote/add)
- In `dvc.yaml` in the `vars` field refactor :`python.path:<your_local_python37_path>` , and `R_path:<your_local_R_path>`
- After you installed the requirements from [README](https://github.com/radistoubalidis/GCBM.Belize/blob/master/README.md "README" ) you can test the pipeline by running :
- `dvc repro` or `dvc exp run`
- Dvc as a default does not define an order in the pipeline stages , it does it only if for each `i-th` stage with output `x` its next one `i+1-th` has `x` as a dependency.By creating a log file for each stage and adding it as a dependency for the next stage we achieve pipeline execution in order.
- you can see the metrics created in `post_processing` stage from the `analyze.py` script by running `dvc metrics show`
- After the pipeline is executed and you have setup your remote storage you can run `dvc push` and for every stage, the files that are included in the `outs` field are going to be pushed in your remote storage.
- There is a demonstration video and a Jupyter Notebook that iterates the compiled Spatial output and displays an example [here](http://https://drive.google.com/drive/folders/1p4PzaacNU6rddXWuljtaO1_G1LIoiw-2?usp=sharing "here")

#### Tiler
> Working Directory: `Standalone_GCBM\layers\tiled`
@@ -26,60 +29,81 @@ Reminder :
#### recliner2gcbm_x64
> Working Directory: `Standalone_GCBM`

>Command: `tools\Recliner2GCBM-x64\Recliner2GCBM.exe -c input_database\recliner2gcbm_config.json`
>Command: `tools\Recliner2GCBM-x64\Recliner2GCBM.exe -c input_database\recliner2gcbm_config.json >> logs\reclner_log.txt`

>Dependencies(Relative paths):
- `logs\tiler_log.txt`
- `input_database\gcbm_input.db`
- `tools\recliner2gcbm-x64\recliner2gcbm.exe`
- `input_database\Growth_Curves.csv`
- `input_database\recliner2gcbm_config.json`
- `input_database\ArchiveIndex_Beta_Install.mdb`
>Outputs:
- `logs\recliner_log.txt`

#### add_species_vol_to_bio
> Working Directory: `Standalone_GCBM`

>Command: `python input_database\add_species_vol_to_bio.py input_database\gcbm_input.db`

>Dependencies(Relative paths):
- `logs\recliner_log.txt`
- `input_database\add_species_vol_to_bio.py`
- `input_database\gcbm_input.db`
>Outputs:
- `logs\add_species_vol_to_bio.log`

#### modify_root_parameters
> Working Directory: `Standalone_GCBM`

>Command: `python input_database\modify_root_parameters.py input_database\gcbm_input.db`

>Dependencies(Relative paths):
- `logs\add_species_vol_to_bio.log`
- `input_database\modify_root_parameters.py`
- `input_database\gcbm_input.db`
>Outputs:
- `logs\modify_root_parameters.log`

#### modify_decay_parameters
> Working Directory: `Standalone_GCBM`

>Command: `python input_database\modify_decay_parameters.py input_database\gcbm_input.db`

>Dependencies(Relative paths):
- `logs\modify_root_parameters.log`
- `input_database\modify_decay_parameters.py`
- `input_database\gcbm_input.db`

>Outputs:
- `logs\modify_decay_parameters.log`

#### modify_turnover_parameters
> Working Directory:`Standalone_GCBM`

>Command: `python input_database\modify_turnover_parameters.py input_database\gcbm_input.db`

>Dependencies(Relative paths):
- `logs\modify_decay_parameters.log`
- `input_database\modify_turnover_parameters.py`
- `input_database\gcbm_input.db`

>Outputs:
- `logs\modify_turnover_parameters.log`

#### modify_spinup_parameters
> Working Directory:`Standalone_GCBM`

>Command: `python input_database\modify_spinup_parameters.py input_database\gcbm_input.db`

>Dependencies(Relative paths):
- `logs\modify_turnover_parameters.log`
- `input_database\modify_spinup_parameters.py`
- `input_database\gcbm_input.db`

>Outputs:
- `logs\modify_spinup_parameters.log`

#### update_GCBM_Configuration
> Working Directory: `Standalone_GCBM\gcbm_project`

@@ -101,6 +125,7 @@ Reminder :
>Command: `run_gcbm.bat`

>Dependencies(Relative paths):
- `..\logs\update_gcbm_config.log`
- `run_gcbm.bat`
- `..\tools\GCBM\moja.cli.exe`
- `gcbm_config.cfg`
@@ -116,6 +141,7 @@ Reminder :
>Command: `create_tiffs.bat`

>Dependencies(Relative paths):
- `..\..\logs\Moja_Debug.log`
- `create_tiffs.bat`
- `create_tiffs.py`
- `..\..\gcbm_project\output`
@@ -130,21 +156,30 @@ Reminder :
>Command: `compileGCBMResults.bat`

>Dependencies(Relative paths):
- `..\..\logs\create_tiffs.log`
- `compileGCBMResults.bat`
- `compileresults.py`
- `compileresults.json`
- `..\..\gcbm_project\output\gcbm_output.db`
- `..\..\processed_output\compiled_gcbm_output.db`

>Outputs:
- `..\..\logs\compile_results.log`

#### post_processing
>Working Directory: `Postprocessing`

>Command: `C:\Develop\R-4.1.3\bin\R.exe CMD BATCH Summarize_DOM_Stocks.R`
>Commands:
- `${R_path} CMD BATCH Summarize_DOM_Stocks.R`
- `${python_path} analyze.py`

>Dependencies(Relative paths):
- `..\Standalone_GCBM\logs\compile_results.log`
- `Summarize_DOM_Stocks.R`
- `..\Standalone_GCBM\processed_output\compiled_gcbm_output.db`
- `Tables`
- `Rplots.pdf`

>Output Plots: `./Figures`
>Outputs:
- Plots: `Postprocessing\Figures`
- metrics: `PostProcessing\Metrics`