-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The current configuration is too complicated and inefficient. It might be simplified if we could leverage JPQL search expressions in the config files.
Due to the design of icat.oaipmh, we need to configure, which properties from which objects in ICAT to consider for an object to be disseminated over OAI-PMH in a first step. From this, an internal XML representation of these objects is created. In a second step, this internal representation is transformed using XSLT.
Only to compile all the ICAT entity objects needed for the metadata of a data publication, the following configuration lines are needed:
# Identifiers for the configuration of metadata to be retrieved from ICAT
data.configurations = datapub
# Relevant data objects and properties for each data configuration
data.datapub.mainObject = DataPublication
data.datapub.stringProperties = pid title description subject
data.datapub.numericProperties = id
data.datapub.dateProperties = publicationDate
data.datapub.subPropertyLists = users dates relatedItems fundingReferences content
data.datapub.users.stringProperties = orderKey fullName givenName familyName contributorType email
data.datapub.users.subPropertyLists = user affiliations
data.datapub.users.user.stringProperties = orcidId
data.datapub.users.affiliations.stringProperties = name pid fullReference
data.datapub.dates.stringProperties = dateType date
data.datapub.relatedItems.stringProperties = identifier relationType fullReference relatedItemType title
data.datapub.fundingReferences.subPropertyLists = funding
data.datapub.fundingReferences.funding.stringProperties = funderIdentifier funderName awardNumber awardTitle
data.datapub.content.subPropertyLists = dataCollectionDatasets
data.datapub.content.dataCollectionDatasets.subPropertyLists = dataset
data.datapub.content.dataCollectionDatasets.dataset.numericProperties = fileSize
data.datapub.content.dataCollectionDatasets.dataset.subPropertyLists = datafiles
data.datapub.content.dataCollectionDatasets.dataset.datafiles.subPropertyLists = datafileFormat
data.datapub.content.dataCollectionDatasets.dataset.datafiles.datafileFormat.stringProperties = type
This seems to be too clumsy.
Roughly the same could be achieved with a single JPQL search expression:
SELECT dp FROM DataPublication dp INCLUDE dp.content AS dc, dc.dataCollectionDatafiles AS dcdf, dcdf.datafile AS df1, df1.datafileFormat, dc.dataCollectionDatasets AS dcds, dcds.dataset AS ds, ds.datafiles AS df2, df2.datafileFormat, dp.dates, dp.fundingReferences AS dpfun, dpfun.funding, dp.relatedItems, dp.users AS dpu, dpu.affiliations, dpu.userFurthermore, the internal XML representation roughly corresponds one to one to the ICAT schema. This means that if we want to include the experimental techniques being used in an investigation, we need to include all datasets from that investigation in the internal representation, which might look something like:
<metadata>
<datasets>
<instance>
<datasetTechniques>
<instance>
<technique>
<name>neutron diffraction</name>
<pid>PaNET:PaNET01217</pid>
</technique>
</instance>
</datasetTechniques>
</instance>
<instance>
<!-- ... -->
</instance>
<instance>
<!-- ... -->
</instance>
<!-- ... -->
</datasets>
<!-- ... -->
</metadata>Note that there may be hundreds of datasets in one investigation. Often they all have the same technique, but that is not guaranteed. The distinct techniques must then be extracted from that using XSLT, which is also somewhat involved.
In princlple, we could select the list of distinct techniques related to an investigation using one simple JPQL search statement like:
SELECT DISTINCT(t) FROM Technique t JOIN t.datasetTechniques AS dst JOIN dst.dataset AS ds JOIN ds.investigation AS i WHERE i.id = %d(where the %d would need to be substituted with the internal id of that investigation.)
So if we could compile the internal XML representation by a couple JPQL searches configured in the config file, things might become significantly simpler.