Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Ingest Gregory of Nyssa treebanks #186

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

jacobwegner
Copy link
Contributor

No description provided.

Copy link
Contributor Author

@jacobwegner jacobwegner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtauber Here are a few notes about Gio's data.

As deployed to https://preview.scaife-viewer.eldarion.com/nyssa-di-russo, the "CTS Versions" are at the sentence level. This was easier than mapping to the TEI XML from OpenGreekAndLatin/First1KGreek, but I'm not sure if that will be suitable to Greg or not.

)


GIO_DATA_DIR = Path(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import conllu


GIO_DATA_DIR = Path(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -231,6 +231,32 @@ def create_glaux_collection(reset=False):
tas.update(collection=collection)


def create_gio_collection(reset=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main()

# remaining steps
# - [] attribution records
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See load_boano_metadata for a possible implementation.

Like in #185 (comment), I would expect this to be made available as syntax tree collection metadata.

main()

# remaining steps
# - [ ] populate library metadata
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to bring more metadata over from https://scaife.perseus.org/library/urn:cts:greekLit:tlg2022.tlg007/.

The "gio" prefix in the CTS Versions indicates that we're using the sentences from the treebanks; this would need to be mapped to the canonical reference scheme of chapter

# - [x] extract flat text files
# - [x] create token files
# - [x] write alignment files
# - [ ] attribution records
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below about attributions, c.f. load_boano_metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadat stubbed from https://github.com/OpenGreekAndLatin/First1KGreek/blob/5010767bd4f1029c762a410511ca765832b66154/data/tlg2022/tlg007/__cts__.xml

As noted below, the versions here are reconstituted from Gio's data; I have not mapped them to the CTS Versions from OpenGreekAndLatin/First1KGreek.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted via backend/scaife_stack_atlas/extractors/extract_gio_alignments.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted via backend/scaife_stack_atlas/extractors/extract_gio_alignments.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant