Skip to content

Commit

Permalink
Add documentation related to creation and usage of the interaction te…
Browse files Browse the repository at this point in the history
…st data.
  • Loading branch information
khituras committed Sep 20, 2024
1 parent 4443019 commit 9757c79
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 1 deletion.
8 changes: 8 additions & 0 deletions gepi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,14 @@ By default, the project-internal configuration file `configuration.properties.je

Important note: ***Do not edit the `README.md` file in the module roots*** if there exists a `readme-raw` subdirectory. The file in the root is just a Maven-filtered copy of the `readme-raw/README.md` file. The Maven filtering replaces Maven properties like the project version in the `readme-raw/README.md` file and puts the result in the module root, overriding the previous `README.md` file.

### Update the Interaction Test Data

The <code>gepi-core</code> and <code>gepi-webapp</code> projects employ interactions in JSON format for its integration tests. These interactions have the exact form that is sent to ElasticSearch in production. In tests it is also sent to an ElasticSearch server running in a Docker container using the Testcontainers project. The test data is managed in the <code>gepi/gepi-test-data/<code> project where the data files are placed.
To place the test data into the <code>gepi-core/target/generated-resources</code> directory - where it belongs - the whole GePI project must be built:
1) Navigate to the <code>gepi/gepi</code> directory of the repository.
2) Use <code>mvn clean package -DskipTests</code>
This can also be used to update the test data if the contents of the <code>gepi-test-data</code> module has changed, e.g. due to an index schema change that made a re-creation of the test data necessary.

### Update version

Update the new version number in the following places:
Expand Down
19 changes: 18 additions & 1 deletion gepi/gepi-indexing/gepi-indexing-testdata/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@
# GePI Testdata Creation Pipeline

Currently in roadworks.
This pipeline creates molecular interaction examples for the creation of the Maven artifact <code>de.julielab:gepi-test-data:&lt;version></code>.
The project for the artifact itself is located under <code>gepi/gepi-test-data</code> in this repository.

This is a JCoRe pipeline that currently expects an existing Postgres database that contains 100 specific PubMed documents pre-processed with the PubMed preprocessing pipeline found at <code>gepi/gepi-preprocessing</code>. The list of documents that should be pre-processed is found in <code>gepi/gepi-test-data/src/main/resources/test-index-input/test_pmid.txt</code>
There resides the current test data. But if the index format is changed and fields should be added or adapted, the interaction data must be re-created with this pipeline.

A run of this pipeline with an updated index schema follows these steps:

1) Clear GePI-Artifacts (because they contain the indexing code) from the <code>lib/</code> directory.
2) Clear the <code>data/output-json</code> directory of this project, if the directory exists.
3) Add the updated GePI-Artifacts with Maven:

cd ..
mvn clean pacakge -pl gepi-indexing-testdata --also-make
4) Run the pipeline with the JCoRe pipeline runner.
5) Now the new interaction data should be available in <code>data/output-json</code>. From there it can be copied into the <code>gepi-test-data</code> project to update the test data.



5 changes: 5 additions & 0 deletions gepi/gepi-test-data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# GePI Test Data

This is a resource-only project. It delivers pre-processed interaction items in JSON format for the <code>gepi-core</code> module. There, it is used to create a test index in an ElasticSearch in a Docker container.

The data in this project was created using the <code>gepi-indexing/gepi-indexing-testdata</code> pipeline.
8 changes: 8 additions & 0 deletions gepi/readme-raw/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,14 @@ By default, the project-internal configuration file `configuration.properties.je

Important note: ***Do not edit the `README.md` file in the module roots*** if there exists a `readme-raw` subdirectory. The file in the root is just a Maven-filtered copy of the `readme-raw/README.md` file. The Maven filtering replaces Maven properties like the project version in the `readme-raw/README.md` file and puts the result in the module root, overriding the previous `README.md` file.

### Update the Interaction Test Data

The <code>gepi-core</code> and <code>gepi-webapp</code> projects employ interactions in JSON format for its integration tests. These interactions have the exact form that is sent to ElasticSearch in production. In tests it is also sent to an ElasticSearch server running in a Docker container using the Testcontainers project. The test data is managed in the <code>gepi/gepi-test-data/<code> project where the data files are placed.
To place the test data into the <code>gepi-core/target/generated-resources</code> directory - where it belongs - the whole GePI project must be built:
1) Navigate to the <code>gepi/gepi</code> directory of the repository.
2) Use <code>mvn clean package -DskipTests</code>
This can also be used to update the test data if the contents of the <code>gepi-test-data</code> module has changed, e.g. due to an index schema change that made a re-creation of the test data necessary.

### Update version

Update the new version number in the following places:
Expand Down

0 comments on commit 9757c79

Please sign in to comment.