Skip to content

Latest commit

 

History

History
43 lines (34 loc) · 1.32 KB

BUILD.md

File metadata and controls

43 lines (34 loc) · 1.32 KB

Build

Bring-your-own-Bible

  1. Ensure OntoNotes is located at data/ontonotes.

  2. Provide your verses in our TSV format. See our sample, and refer to consts.py for book abbreviations.

  3. Modify the config so that the appropriate paths are set.

  4. Run the conversion: tango run conf/main.jsonnet

Open-access eBible.org Corpus

  1. Ensure OntoNotes is located at data/ontonotes.

  2. Download source USFX files. Please contact Luke Gessler ([email protected]) for the file.

  3. Convert the USFX files into PrOnto's TSV format:

mkdir data/tsv
# Option 1: run in serial
for x in `cut -f1 data/languages.tsv`; do
  python -m pronto.scripts.usfx_to_tsv data/usfx/${x}_usfx.xml data/tsv/${x}-bible.tsv;
done
# Option 2: run in parallel
for x in `cut -f1 data/languages.tsv`; do
  echo "python -m pronto.scripts.usfx_to_tsv data/usfx/${x}_usfx.xml data/tsv/${x}-bible.tsv;" >> commands.txt
done
parallel < commands.txt
  1. Construct datasets:
rm -f commands.txt
mkdir output_log
# Allow one run to go to completion first
LANGUAGE=gulNT tango run conf/main.jsonnet > output_log/gulNT
for x in `cut -f1 data/languages.tsv`; do 
  echo "LANGUAGE=$x tango run conf/main.jsonnet > output_log/$x" >> commands.txt; 
done
parallel < commands.txt