v1.6.0
This release brings a lot of order into IDs and sources of concepts. Concepts - which inacurrately have been called 'terms' before - may come from various sources. It is even possible that the very same concept is included in multiple sources that should be imported into the database. This is actually the main reason why the concept plugin has been created in the first place. The following issues may arise:
- Different concepts from different sources (e.g. databases) may have the same IDs
- while the IDs will be unique in the source database, they might not be across databases; an example for this is NCBI Gene and the NCBI Taxonomy which both use plain numbers as IDs
- The same concepts from different sources may have different IDs in those respective sources
- occurs, for example, when importing BioPortal ontologies that are just a reformulation of a database originally existing without being an ontology. Examples are all the UMLS ontologies that have been imported into BioPortal.
Within the plugin, each concept may have an original ID paired with the original source. This should be the unique ID from the respective original database the concept came from. Then, there is the source ID paired with a source. Each database containing a concept may be a source of it. Thus, in Neo4j, each concept may have multiple source IDs and multiple sources. All these items (original ID / original source, [secondary] source ID / [secondary] source) have been taken together to form the concept coordinates. Now, each imported concept must have coordinates. It is not required to have original source coordinates since they might not be known. The rule that checks if a concept already exist works as follows:
Two terms are equal, iff
* they have the same original source ID assigned from the same original source or
* both have no contradicting original ID and original source but the same source ID and source.
* Contradicting means two non-null values that are not equal.
Coordinates are now used always when a particular concept should be addressed. That means that also for the connections the a parent concept, coordinates are used.
Currently there exist two different coordinates classes, the ConceptCoordinates
and the TermCoordinates
. The TermCoordinates
are older and only represent an (ID,source)
pair without specifying if it is an original ID and source or a secondary source. The TermCoordinates
are marked as deprecated and will be removed in future versions. There are currently used in some places and should be replaced by ConceptCoordinates
.
Also, the internal process of term insertion has been restructured and should be more efficient now.