[Meta] Support multiple data types in a single instance

It should be possible to store records in different data types (EAD, LIDO) in a single Datahub installation.

Version 1.x doesn't support multiple data types. A Datahub instance only supports one data type. If you want to aggregate and disseminate data from multiple sources / formats, then you need to install multiple instances. i.e. an instance for LIDO and an instance for EAD.

**Motivation** 

* Organisations don't store and manage one type of data (museal, archival, library,...) but multiple types.
* Installing multiple instances implies extra costs and might be cumbersome (hosting, maintenance, security, workflows & processes,...)

**Discussion / Impact**

The API's (OAI/REST) need to be able to differentiate between formats (EAD, LIDO,...). More accurately, the API's need to differentiate between data types of the records; or even differentiate between different types of collections of records independent of their formatting. Whereas a collection references a specific context (i.e. records describing all digitised photos created by an organisation, the archival holdings of James Joyce, sheet music written by a composer, archived medical records from a hospital, etc.)

Content negotiation allows the same record being delivered in different formats & data models. Since Dublin Core is a general purpose format / model, the same metadata record could be delivered as DC and LIDO. But it does not make sense to model the same record as LIDO and EAD since a record is either an archival description or an object record. Not both. Context matters. Formats such as MODS linger between the specificity of MARC (library data) and the general purpose flexibility of DC. It would be possible to describe museal objects in both MODS and LIDO since both models lend themselves to that end. But equally, we could store two discrete collections of records in the hub: MODS formatted library data & LIDO museal data.

The core principle of the Datahub is to avoid any internal data modelling (transformations, mappings, editing,...) of the records itself though, since those add complexity and assumptions that decrease flexibility. Data modelling is always deferred as an external responsibility. This implies that records might have a DC, MODS or LIDO representation, or not. This makes formal content negotiation where every record always has multiple representations, less feasible to implement.

Collections, on the other hand, could potentially be modelled through dynamic URL paths within both the REST and OAI endpoints. The hub would feature one REST API featuring collections, and multiple OAI endpoints (one per collection). Management would happen through the administrative interface.

This would also impact the OAuth implementation. Currently, protecting access to the data is all or nothing. But access is also a context driven aspect. One could differentiate between "digitized photos either accessible to the public or kept private because of copyright". 






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Meta] Support multiple data types in a single instance #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Meta] Support multiple data types in a single instance #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions