Skip to content

Commit

Permalink
Make the data validation part of the conversion step
Browse files Browse the repository at this point in the history
This allows change detection to work, so if the training data
does not change, we can skip the validation entirely
  • Loading branch information
thatbudakguy committed Jan 15, 2024
1 parent b2f3922 commit 63457bb
Showing 1 changed file with 3 additions and 8 deletions.
11 changes: 3 additions & 8 deletions core_inception/project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ workflows:
all:
- install-dependencies
- install-language
- validate-annotations
- convert-raw-text
- convert-annotations
- split-data
Expand All @@ -41,7 +40,6 @@ workflows:
- install-dependencies
- install-language
setup:
- validate-annotations
- convert-raw-text
- convert-annotations
- split-data
Expand All @@ -61,11 +59,6 @@ commands:
deps: ["assets/lang/${vars.lang}"]
script: ["pip install -e assets/lang/${vars.lang}"]

- name: validate-annotations
help: "Validate the files exported from INCEpTION"
deps: ["assets/annotations"]
script: ["python scripts/validate_annotations.py assets/annotations"]

- name: convert-raw-text
help: "Convert raw text files to spaCy's format"
script: ["python scripts/convert_text.py assets/text corpus/ ${vars.n_sents} ${vars.lang}"]
Expand All @@ -74,7 +67,9 @@ commands:

- name: convert-annotations
help: "Convert annotated data from INCEpTION to spaCy's format"
script: ["python scripts/convert_annotations.py assets/annotations corpus/ ${vars.n_sents} ${vars.lang}"]
script:
- "python scripts/validate_annotations.py assets/annotations"
- "python scripts/convert_annotations.py assets/annotations corpus/ ${vars.n_sents} ${vars.lang}"
deps: ["assets/annotations"]
outputs: ["corpus/all.spacy"]

Expand Down

0 comments on commit 63457bb

Please sign in to comment.