Transcriptomics EDAM is an ontology developed to model experimental processes and related entities in computational transcriptomics experiments.
This ontology was developed to improve the coverage of objects, processes, and intermediate states commonly encountered in transcriptomics workflows. It combines the original EDAM ontology (v1.25) and STATO. It also contains terms extracted from transcriptomics publications.
- We introduce an upper-level class,
data status, to represent transformations where the data type and format remain unchanged but the data content changes. For example, a dataset may become filtered without changing its file format or structural type. - We add a
Databasebranch, reflecting the fact that database references often implicitly indicate both data types and associated operations. Due to the large number of named databases, only a representative subset is included in the core ontology (see thewithoutDatabasesfolder). ThewithDatabasesfolder contains a more comprehensive collection of database names curated manually and sourced from the Nucleic Acids Research (NAR) database collection. However, this version is less actively maintained, as it introduces significant overhead during normalisation. - To better represent statistical objects and analytical operations, we import relevant branches from an existing statistics ontology.
- We include object properties that capture biological and analytical semantics, such as
has inputandhas means. While expressive, their practical use is currently limited due to the high curation cost required for consistent application. - To further improve coverage, we collected frequently used transcriptomics-related entities from the literature and added them to the ontology. Definitions and synonyms for these classes were generated using large language models and are currently presented as a flat list.
During transcriptomics methodology development, this ontology is continuously updated based on manual inspection and practical modelling needs.
-
How should the statement “The gene lists were filtered based on differential expression analysis” be normalised and represented semantically?
-
If a study uses the Gene Expression Omnibus, what types of analyses are implied, and which data formats should be expected and modelled?
withoutDatabase
| Version | Total classes | Branch operation | Branch topic | Branch data | Branch format |
|---|---|---|---|---|---|
| 0.0.2 |