-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Ingest Gregory of Nyssa treebanks #186
base: main
Are you sure you want to change the base?
Conversation
Sources: * https://github.com/OpenGreekAndLatin/First1KGreek/tree/ca6b83d057151e3dd0adc55237a14808877ea4db/data/tlg2022 * https://github.com/OpenGreekAndLatin/First1KGreek/tree/ca6b83d057151e3dd0adc55237a14808877ea4db/data/tlg2022/tlg007 * https://github.com/gregorycrane/nicenefathers/blob/master/npnf207.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtauber Here are a few notes about Gio's data.
As deployed to https://preview.scaife-viewer.eldarion.com/nyssa-di-russo, the "CTS Versions" are at the sentence level. This was easier than mapping to the TEI XML from OpenGreekAndLatin/First1KGreek
, but I'm not sure if that will be suitable to Greg or not.
) | ||
|
||
|
||
GIO_DATA_DIR = Path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import conllu | ||
|
||
|
||
GIO_DATA_DIR = Path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -231,6 +231,32 @@ def create_glaux_collection(reset=False): | |||
tas.update(collection=collection) | |||
|
|||
|
|||
def create_gio_collection(reset=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #185 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main() | ||
|
||
# remaining steps | ||
# - [] attribution records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See load_boano_metadata for a possible implementation.
Like in #185 (comment), I would expect this to be made available as syntax tree collection metadata.
main() | ||
|
||
# remaining steps | ||
# - [ ] populate library metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to bring more metadata over from https://scaife.perseus.org/library/urn:cts:greekLit:tlg2022.tlg007/.
The "gio" prefix in the CTS Versions indicates that we're using the sentences from the treebanks; this would need to be mapped to the canonical reference scheme of chapter
# - [x] extract flat text files | ||
# - [x] create token files | ||
# - [x] write alignment files | ||
# - [ ] attribution records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below about attributions, c.f. load_boano_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadat stubbed from https://github.com/OpenGreekAndLatin/First1KGreek/blob/5010767bd4f1029c762a410511ca765832b66154/data/tlg2022/tlg007/__cts__.xml
As noted below, the versions here are reconstituted from Gio's data; I have not mapped them to the CTS Versions from OpenGreekAndLatin/First1KGreek
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted via backend/scaife_stack_atlas/extractors/extract_gio_alignments.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted via backend/scaife_stack_atlas/extractors/extract_gio_alignments.py
No description provided.