-
Ensure OntoNotes is located at
data/ontonotes
. -
Provide your verses in our TSV format. See our sample, and refer to consts.py for book abbreviations.
-
Modify the config so that the appropriate paths are set.
-
Run the conversion:
tango run conf/main.jsonnet
-
Ensure OntoNotes is located at
data/ontonotes
. -
Download source USFX files. Please contact Luke Gessler ([email protected]) for the file.
-
Convert the USFX files into PrOnto's TSV format:
mkdir data/tsv
# Option 1: run in serial
for x in `cut -f1 data/languages.tsv`; do
python -m pronto.scripts.usfx_to_tsv data/usfx/${x}_usfx.xml data/tsv/${x}-bible.tsv;
done
# Option 2: run in parallel
for x in `cut -f1 data/languages.tsv`; do
echo "python -m pronto.scripts.usfx_to_tsv data/usfx/${x}_usfx.xml data/tsv/${x}-bible.tsv;" >> commands.txt
done
parallel < commands.txt
- Construct datasets:
rm -f commands.txt
mkdir output_log
# Allow one run to go to completion first
LANGUAGE=gulNT tango run conf/main.jsonnet > output_log/gulNT
for x in `cut -f1 data/languages.tsv`; do
echo "LANGUAGE=$x tango run conf/main.jsonnet > output_log/$x" >> commands.txt;
done
parallel < commands.txt