@@ -15,34 +15,58 @@ may fail if the `generate_shared_intermediates.py` module has not been run.
15
15
16
16
## Running Computations
17
17
18
- Before running other modules, the ` generate_shared_intermediates.py ` module must be
19
- run to generate datasets derived from the archival data used by other modules.
18
+ Analysis files which generate graphs or statistics from the data are located in
19
+ the root directory of the project, and can be run individually after the
20
+ environment is setup.
20
21
21
- ``` shell script
22
- poetry run python generate_shared_intermediates.py
23
- ```
22
+ ### Prequesites and environment setup
24
23
25
- Analysis files which generate graphs or statistics from the data are located in
26
- the root directory of the project, and can be run individually after the shared
27
- intermediates have been generated.
28
-
29
- The project has a concept of a "platform", which allows splitting data-intensive
30
- computation from visualization. Create a new ` platform-conf.toml ` file in the
31
- project root directory from the ` example-platform-conf.toml ` to customize the
32
- platform behavior. This can be useful when using a different server or cloud
33
- resources to prepare data with a local machine to generate visualizations.
34
- Details of the specific entries in ` platform-conf.toml ` are included as comments
35
- in the example file.
36
-
37
- ### Dependencies
38
-
39
- Dependencies for the project are managed with poetry, an external tool you may
40
- need to install. After checking out the repository, run ` poetry shell ` in the
41
- root directory to get a shell in a virtual environment. Then run ` poetry install `
42
- to install all the dependencies from the lockfile. If this command gives you
43
- errors, you may need to install some supporting libraries (libsnappy) for your
44
- platform. You will need to be in the poetry shell or run commands with poetry
45
- run to actually run scripts with the appropriate dependencies.
24
+ 1 . Create the renders directory for final graph exports and scratch directory
25
+ for checkpointed work in progress.
26
+
27
+ ```
28
+ mkdir renders
29
+ mkdir scratch
30
+ ```
31
+
32
+ 2 . Download and extract the data (see [ data] ( #data ) )
33
+
34
+ 2 . Install dependencies
35
+
36
+ Dependencies for the project are managed with poetry, an external tool you
37
+ may need to install. After checking out the repository, run ` poetry shell ` in
38
+ the root directory to get a shell in a virtual environment. Then run `poetry
39
+ install` to install all the dependencies from the lockfile. If this command
40
+ gives you errors, you may need to install some supporting libraries
41
+ (libsnappy) for your platform. You will need to be in the poetry shell or run
42
+ commands with poetry run to actually run scripts with the appropriate
43
+ dependencies.
44
+
45
+ 3 . Create a platform file
46
+
47
+ The project has a concept of a "platform", which allows splitting
48
+ data-intensive computation (which you probably want to do on a server) from
49
+ visualization (which you might want to do on your own machine). Create a new
50
+ ` platform-conf.toml ` file in the project root directory from the
51
+ ` example-platform-conf.toml ` to customize the platform behavior. This can be
52
+ useful when using a different server or cloud resources to prepare data with
53
+ a local machine to generate visualizations.
54
+
55
+ The scripts are setup to generate intermediate reduced datasets in the
56
+ ` ./scratch ` directory when extensive pre-computation is needed. This
57
+ directory can then be copied from the server to your local machine to run the
58
+ actual visualizations. Details of the specific entries in
59
+ ` platform-conf.toml ` are included as comments in the example file.
60
+
61
+ 4 . Generate the shared intermediate files.
62
+
63
+ Before running other modules, the ` generate_shared_intermediates.py ` module
64
+ must be run to generate datasets derived from the archival data used by other
65
+ modules.
66
+
67
+ ``` shell script
68
+ poetry run python generate_shared_intermediates.py
69
+ ```
46
70
47
71
### Dask
48
72
The uncompressed dataset size as of March 2020 is too large to fit on
0 commit comments