Skip to content

Script to reserve ontology term IDs #1

@jamesaoverton

Description

@jamesaoverton

When adding a new term to an ontology, we need to give it a new ID. IDs are usually numeric and sequential, e.g. OBI:0000070. Because people may be working on multiple branches in parallel, the term reservations have to live outside the development branches of the ontology repository.

Here's an example where we started coordinating OBI term IDs via Google Sheet:

https://docs.google.com/spreadsheets/d/1tpDrSiO1DlEqkvZjrDSJrMm7OvH9GletljaR-SDeMTI

The important information is:

  • term ID
  • term label
  • developer
  • date
  • comment, preferably with a GitHub issue or PR number

The main drawbacks of a Google Sheet are (1) a separate username/authentication mechanism from GitHub, and (2) it's a pain to write an authenticated script to add a new request.

So I would prefer to use GitHub. There should be a reserved-terms.txt file on a special branch of the repo named term-ids. Each line of the file should start with a term ID (e.g. OBI:0012345) followed by a space and the label. The commit message should include a comment with GitHub issue or PR number (e.g. #1234). The commit will record the date and the username. The git blame view will then show all the important information above. Users can edit reserved-terms.txt file manually using the GitHub web interface.

There can also be a published-terms.txt file, with the same format, listing all the officially published term IDs for the ontology (e.g. OBI:0000070 assay), in the same format as reserved-terms.txt.

To supplement manual edits, I want a reserve-terms.py script that will:

  1. read the published-terms.txt file from GitHub
  2. read the reserved-terms.txt file from GitHub
  3. figure out the next available ID
  4. either:
    A. accept a command line argument one or more new term labels, or
    B. read a local file containing one or more new term labels: one label per line
  5. check that the requested labels are not already present in published-terms.txt or reserved-terms.txt; if any are present, print a helpful error and quit
  6. assign new IDs for each new label, and print the IDs and labels to STDOUT
  7. append lines to reserved-terms.txt -- in memory, don't rely on local files
  8. either
    A. accept a commit message as a command-line argument, or
    B. prompt the user for a commit message
  9. Commit the change to GitHub using by
    A. using an OAuth token from the environment, or
    B. prompting the user for a GitHub username and password, or
    C. something more clever?
  10. Print a link to the commit

I don't want this script to use the git CLI or checkout files locally. I'd like it to keep published-terms.txt and reserved-terms.txt in memory.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions