Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Support multiple data types in a single instance #39

Closed
netsensei opened this issue Jan 14, 2018 · 1 comment
Closed

[Meta] Support multiple data types in a single instance #39

netsensei opened this issue Jan 14, 2018 · 1 comment
Milestone

Comments

@netsensei
Copy link
Contributor

It should be possible to store records in different data types (EAD, LIDO) in a single Datahub installation.

Version 1.x doesn't support multiple data types. A Datahub instance only supports one data type. If you want to aggregate and disseminate data from multiple sources / formats, then you need to install multiple instances. i.e. an instance for LIDO and an instance for EAD.

Motivation

  • Organisations don't store and manage one type of data (museal, archival, library,...) but multiple types.
  • Installing multiple instances implies extra costs and might be cumbersome (hosting, maintenance, security, workflows & processes,...)

Discussion / Impact

The API's (OAI/REST) need to be able to differentiate between formats (EAD, LIDO,...). More accurately, the API's need to differentiate between data types of the records; or even differentiate between different types of collections of records independent of their formatting. Whereas a collection references a specific context (i.e. records describing all digitised photos created by an organisation, the archival holdings of James Joyce, sheet music written by a composer, archived medical records from a hospital, etc.)

Content negotiation allows the same record being delivered in different formats & data models. Since Dublin Core is a general purpose format / model, the same metadata record could be delivered as DC and LIDO. But it does not make sense to model the same record as LIDO and EAD since a record is either an archival description or an object record. Not both. Context matters. Formats such as MODS linger between the specificity of MARC (library data) and the general purpose flexibility of DC. It would be possible to describe museal objects in both MODS and LIDO since both models lend themselves to that end. But equally, we could store two discrete collections of records in the hub: MODS formatted library data & LIDO museal data.

The core principle of the Datahub is to avoid any internal data modelling (transformations, mappings, editing,...) of the records itself though, since those add complexity and assumptions that decrease flexibility. Data modelling is always deferred as an external responsibility. This implies that records might have a DC, MODS or LIDO representation, or not. This makes formal content negotiation where every record always has multiple representations, less feasible to implement.

Collections, on the other hand, could potentially be modelled through dynamic URL paths within both the REST and OAI endpoints. The hub would feature one REST API featuring collections, and multiple OAI endpoints (one per collection). Management would happen through the administrative interface.

This would also impact the OAuth implementation. Currently, protecting access to the data is all or nothing. But access is also a context driven aspect. One could differentiate between "digitized photos either accessible to the public or kept private because of copyright".

@netsensei netsensei added this to the Version 2.0 milestone Sep 14, 2018
@netsensei
Copy link
Contributor Author

This discussion will be continued in #90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant