DAILP Ingest is a tool that reads Cherokee data from a set of spreadsheets and makes HTTP calls to upload those data to an Online Linguistic Database (OLD) instance.
Clone the source from GitHub:
$ git clone https://github.com/dativebase/dailp-ingest-clj.git
Using Leiningen, create the standalone .jar file:
$ lein uberjar
Note: you must have Leiningen and Clojure installed.
To ingest the data from the DAILP Google Sheet sources to a specific OLD instance, supply a valid OLD instance's URL, username and password:
$ java -jar target/uberjar/dailp-ingest-clj-0.1.0-SNAPSHOT-standalone.jar \ https://some.domain.com/path/to/old/instance/ \ someusername \ somepassword
For example, if you are running a local OLD instance using the DativeBase Docker Compose deployment method, then the following will probably work:
$ java -jar target/uberjar/dailp-ingest-clj-0.1.0-SNAPSHOT-standalone.jar \ http://127.0.0.1:61001/old/ \ admin \ adminA_1
Alternatively, use lein run:
$ lein run \ https://some.domain.com/path/to/old/instance/ \ someusername \ somepassword
Note: this ingest script requires that you have read access to the Google Sheet files listed below.
- Functional morphemes:- Tags (DONE): https://docs.google.com/spreadsheets/d/1eEk3JP2WTkP8BBShBHURrripKPredy-sCutQMiGfVmo/edit#gid=0."
- Orthographic Inventories (DONE): https://docs.google.com/spreadsheets/d/16Dfq04tCSP0kuqBdMX1R3DHJ7kB6RJ3Uy7-ufN-Y_w8/edit#gid=886203972
- Syntactic Categories (DONE): https://docs.google.com/spreadsheets/d/159i_Cygdqsnp55QBzqJu7eozxsNEiVIiXhwEzls3q7g/edit?usp=sharing."
- Prepronominal Prefixes (DONE): https://docs.google.com/spreadsheets/d/12v5fqtOztwwLeEaKQJGMfziwlxP4n60riMsN9dYw9Xc/edit#gid=0
- Pronominal Prefixes (DONE):- Combined (DONE): https://docs.google.com/spreadsheets/d/1OMzkbDGY1BqPR_ZwJRe4-F5_I12Ao5OJqqMp8Ej_ZhE/edit?usp=sharing
- Sets A & B (DONE): https://docs.google.com/spreadsheets/d/1D0JZEwE-dj-fKppbosaGhT7Xyyy4lVxmgG02tpEi8nw/edit?usp=sharing
- Reflexive & Middle (DONE): https://docs.google.com/spreadsheets/d/1Q_q_1MZbmZ-g0bmj1sQouFFDnLBINGT3fzthPgqgkqo/edit?usp=sharing
 
- Modal Suffixes (DONE): https://docs.google.com/spreadsheets/d/1QWYWFeK6xy7zciIliizeW2hBfuRPNk6dK5rGJf2pdNc/edit#gid=0
- Aspectual Suffixes (DONE): https://docs.google.com/spreadsheets/d/19jPHtphsvWDliWq9z3WL_Fz6omHCFTFseD6fh1FLY70/edit#gid=0
 
- Verbs (roots and inflected forms) (1/2 DONE):- DF1975 (DONE):
- DF2003 (DONE):
 
- Google Drive folder with master data sets: https://drive.google.com/drive/folders/1U2ZtSQfMbX1b86SbX3BJw0Cx5Fs78vZ7
- Google doc describing the functional closed-class spreadsheets: https://docs.google.com/document/d/1jUgIjOMH_c0HHnQaJZjBXny7XPConrphToaz8nyGDX0/edit#heading=h.j48zjb5g20tm
- Syntactic categories for Verb roots and inflected forms. - Should all of the verb roots have category "V"? Or should transitivity information be used to determine a transitivity-based category, e.g., "VT", "VTA", etc.?
- What syntactic category do we want the inflected verb forms to have? I have been giving them "S". We could give them "VP" or some such thing ...
 
- Sources. What should the Sources be for DF 1975 and DF 2003? The best thing would be for me to create a Google Sheet for DAILP sources and automate the ingest of it. The ingest script will then have to be modified to document the correct source for each form ingested. - See my draft sources GSheet at: https://docs.google.com/spreadsheets/d/1W46XymhtohAizs_KVRCNvfTUL0k4-LWwYbEv8aHau_4/edit?usp=sharing
 
- Do the "surface form" values of the DF1975--Master spreadsheet need to be modified in any way? 
- DF1975-Master Questions. - Row 1881 has a root line that only has values for "Transitivity" "I" and
"UDB Class" "4a.i.irr.". I have been taking this to mean that there is a
verb root with shape "hno:" that is the intransitive counterpart of
transitive "tell". I have been adding a new OLD form for this intransitive
verb root. Is this correct?- Note: There appear to be 4 rows with only a "Transitivity" value as described just above, 3 with "I", and one with "T".
 
- There are about 100 forms lacking translations. To find them, search in the OLD for translations with the following transcription value: "FIXME TRANSLATION NEEDED".
- There are a handful of forms lacking valid morpheme gloss values. In some
(9) cases a default value of "FIXME.MORPHEME.GLOSS.NEEDED" was used.
Search for this value to find them. In a handful of cases, a value was
constructed using the first translation value. Here are those constructed
values:- "scatter.(intransitive)"
- "pour.into.a.container,.fill.up"
- "(sun.or.moon).shine,.be.sunny"
- "(the.ground).become.frosty"
- "thunder"
 
 
- Row 1881 has a root line that only has values for "Transitivity" "I" and
"UDB Class" "4a.i.irr.". I have been taking this to mean that there is a
verb root with shape "hno:" that is the intransitive counterpart of
transitive "tell". I have been adding a new OLD form for this intransitive
verb root. Is this correct?
- Tags for Affix Allomorphs. - Allomorph 4 of "Reflexive & Middle Pronominal Prefixes". What tag should be used for these? I am using the "pp-pre-v" tag. Is this correct?
- Allomorph 4 of "Modal Suffixes". What tag should be used for these? I am using the "mod-pre-v" tag. Is this correct?
 
- Morpheme break transcription conventions mismatch. I notice that different transcription conventions are being used for the morpheme break line of morphemes and the same line of inflected verb forms. For example, this (DF2003) inflected verb: - transcription ᎯᏕᎸᎢ morpheme break /hi:-t-e:!l-v:'i/ morpheme gloss 2SG>AN-give.LG-PFT-FUT.IMP translations Give it (LG) to him later! - Presumably contains this 2SG>AN morpheme: - transcription hii morpheme break /hii/ morpheme gloss 2SG>AN translations Set A 2SG AN Pronominal Prefix - However, observer that the colon is being used to signify length in the former ( - hi:) while double vowels are being used in the latter (- hii).- Similarly, the DF2003 morpheme break values are using the glottal stop Unicode character while the aspectual suffixes morpheme break values are using the apostrophe. - We should probably enforce some consistency here, especially in anticipation of parser development. Guidance on which forms to modify? 
Copyright © 2019 Joel Dunham
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.
This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.