Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

populate rights-related information on collection level automatically #42

Open
csae8092 opened this issue Feb 18, 2025 · 5 comments
Open

Comments

@csae8092
Copy link
Member

Currently acdh:hasOwner; acdh:hasRightsHolder and acdh:hasLicensor are mandatory on Resource AND (Top)Collection level

Ideally

  • they can be made optional for (Top)Collections,
  • and automatically populated by the doorkeeper on ingest (similar to acdh:hasLicenseSummary)

unlike acdh:hasLicenseSummary the (Top)Collection should list all related object and not summarize them as with hasLicenseSummary

@sstuhec approves this as it would make data curation much easier (and less error prone)
@zozlak said it's technically doable

maybe to discuss how to express this in the ontology as we can't easily mark them as Automated Fill? and if they are marked optional for (Top)Collections, how should doorkeeper deal with any properties provided by the arche-metadata file. Ignore/delete those or merge them with the information from the child resources.

@zozlak
Copy link
Member

zozlak commented Feb 18, 2025

how should doorkeeper deal with any properties provided by the arche-metadata file

There are only two options available:

  • if it's a (Top)Collection, discard whatever is provided by the ingestion and fill it in with the data aggregated from children
  • if it's a (Top)Collection, take a union of whatever is provided by the ingestion and fill it in with the data aggregated from children

Both options are fine for me but I guess curators prefer the latter.

we can't easily mark them as Automated Fill? and if they are marked optional for (Top)Collections

Indeed we can't as this will be class-dependent. While it is no a concern for a doorkeeper, it is for the checks performed by the metadata crawler. Peter's proposal to change these property cardinalities to 0-n on (Top)Collections would indeed solve the issue on the technical level.

@sstuhec
Copy link
Contributor

sstuhec commented Feb 20, 2025

Is there also a third option? That nothing happens in ingestion, if you already provide metadata? This is how other "Automated Fills" work.. if you do you, then doorkeeper leaves it as it is, if you have nothing, it adds it for you.

Since we cannot simply add the Automated Fill label we should mention this in the description, or add a note of some sort (if that is possible). So that it is clear that these properties are never empty (even though that the cardinality would be 0)

@zozlak
Copy link
Member

zozlak commented Feb 20, 2025

This is how other "Automated Fills" work.. if you do you, then doorkeeper leaves it as it is, if you have nothing, it adds it for you.

For some yes, for some it's always overwritten.

The problem with approach you (@sstuhec) propose is that that way the doorkeeper will never refresh once generated values, even if the corresponding property values in (Top)Collection children are changed. This is because it has no way to tell if the existing values originally come from the doorkeeper (and thus should be refreshed) or from a metadata curator (and should be kept intact). Long story short the automation will only work on the initial ingest. So it will be useless e.g. for Peter's resource-level-versioning collections which are by design selectively updated over a longer time period.

But do we have a counterexample where we would like to intentionally skip one of the child resource owners/rightsHolders/etc. from the (Top)Collection owners/rightsHolders/etc. list?

By the way, what is a semantic of those properties on the (Top)Collection level really?

@sstuhec
Copy link
Contributor

sstuhec commented Feb 20, 2025

The semantic has been discussed, in the end we decided that for these properties we just accumulate everything that appears within the (Top)Collection, just like for Linceses (summary) and Access (summary). It is only different for Creators, because obviously we are not going to name all "has no name" creators. So, generally, the 2 options that you mention should always work. I just wanted to see, how determining we have to be/how open for curation we could leave it. So far we did not have a problem there, however, you never know - especially because these properties on the TopCollection level are mentioned in the deposition agreement and usually the depositors fill them out themselves in the metadata spreadsheet (I presume the way they intent them to be).

I understand what you mean, so, I would go for the second/union option, but I don't exactly understand, how this can then work (because when refreshing it would also not be clear where the original comes from, or)?

@zozlak
Copy link
Member

zozlak commented Feb 20, 2025

I understand what you mean, so, I would go for the second/union option, but I don't exactly understand, how this can then work (because when refreshing it would also not be clear where the original comes from, or)?

With the refresh the problem is only with dropping values. Dropping a value requires a manual curation but if the metadata curator solely relies on automatically generated summary, then he can do the kinda-manual-curation but always providing sbj schema.delete acdh:hasOwner, etc. for all (Top)Collections which will cause regeneration of values by the doorkeeper on top of an empty values set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants