Thesis: the differentiation between "compiled" and "uncompiled" schemas is not useful, and adds complexity both to the specification and tooling built around it. We should do away with it in the next MAJOR upgrade of HSDS, and seriously consider trying to minimise its impact on the current 3.x series.
Compiled Schemas are not produced declaratively
We know from #565 that compiled schemas are not produced declaratively. They are the result of specific compilation logic buried inside of the openreferral/hsds_schema_tools.
The organization.services array is manually added for the compiled version of organization.json, leading to confusion and issues with Profile generation:
- #565 (Confusion about field appearing in the compiled version)
- hsds_schema_tools/#9 (can't unlink service and organization)
Further, the non-declarative compilation of service_at_location.json results in more errors for profile authors using hsds_schema_tools:
This also obscures the function and rules governing the production of the various "package" and "list" schemas e.g. service_list.json and service_package.json.
The openapi.json spec refers to a mixture of compiled and non-compiled schemas
The openapi.json file contains the following $ref values
This means that each taxonomy.json and taxonomy_term.json are returned in non-compiled form; they are in fact not compiled at all. This is either a deliberate omission or an oversight; but it's unclear.
This also adds a lot of complexity when trying to accommodate Profiles. Either the author must manually override each $ref to make it match the location of their Profile schemas, or the tooling must do this.
hsds_schema_tools tries to circumvent the complexity but this results in bugs. It does a "find and replace" on the value for the base URI for version 3.0 of HSDS as it fetches the contents of openapi.json. This will fail for all versions of HSDS above 3.0, resulting in an erroneous openapi.json (source)
hsds-profile-wizard tries to tackle this head on, attempting to replacing $ref values inside openapi.json to match the profile schema $id values, but this results in by far the most complex function in the entire program since it needs to make a decision about whether something is compiled or not rather than simply looking for a $ref and replacing its base URI. (source)
There are errors and/or discrepencies in the compiled schemas
This relates to the fact they're not declarative. In particular, the "list" schemas returned by the API are modified in ways which result in them lacking properties in the base schemas. This is partly by design, but results in bugs and discrepencies.
When generating the list schemas (e.g. service_list.json, organization_list.json), the hsds_schema_tools script removes all of the one-to-many properties from the list. From what context I can infer, this is possibly to support the API defining return types which don't result in infitely nested and recursive services, organizations, and service_at_location objects:
However, this introduces discrepencies, and arguably errors. The model of service returned by GET /service/{id} contains fields such as service.schedules (an array of schedule.json) and service.additional_urls (an array of url.json). These are not present in the service_list.json schemas and therefore are not declared to be part of the results of GET /services.
This means validation tooling will not be able to validate the contents of these fields for the results of GET /services, and there is a discrepency between different versions of the service schema. The same is true for each organization_list.json and service_at_location_list.json.
Further investigation into the compilation step has revealed that differences between the *_list.json and respective *_package.json compiled schemas is that the list schemas have their one-to-many relationships removed, likely to avoid infinite nesting and/or circular $ref resolution when validating data. This is handled inside of hsds_schema_tools on L510 (for service).
A result of this, is that compiled/service_list.json actually lacks fields added to the specification in recent MINOR versions e.g. service.additional_urls. This is present in service.json, compiled/service.json, and compiled/service_package.json but NOT compiled/service_list.json. This presents aa problem given that compiled/service_list.json is listed as a return type for GET /services in the OpenAPI definition.
There are a number of compiled schemas which are never used elsewhere
The compiled schemas directory contains the following schemas, which are not returned or referenced in any other part of the specification:
organization_package.json
service_at_location_package.json
service_package.json
service_with_definitions.json
tabular.json
The schemas ending in _package.json are just schemas which wrap an object inside an array. The contents are de-referenced, meaning all the definitions are compiled into a single file. E.g. the service_package.json schemas is a schema where the top-level object is an array of service.json, where all the object definitions are de-referenced.
tabular.json appears to be a single JSON schema attempting to replicate a tabular serialisation of the other schemas. The top-level object contains keys representing each of the HSDS schemas, and all the definitions are de-referenced. From a cursory glance, it looks like the models are not modified further to support relational modelling, but can make use of some keys such as service.organization_id to model this.
service_with_definitions.json is an interesting one, as it represents an explictly service-oriented view of HSDS, where the source schemas are all compiled into the definitions array of the JSON Schema files, and look like they've had various properties removed.
Handling this complexity makes HSDS, Profiles, and the tooling surrounding it, difficult to maintain
The presence of compiled schemas makes adds complexity to HSDS and makes it more difficult to maintain, without any substantial benefit.
For those developing Profiles, they need to be aware of Compiled Schemas and the specific complexities of individual compiled schemas to understand how to derive what they want from their Profile. For example, they will need to make a provision for the *_list.json schemas to maintain compatibility with upstream HSDS, or else explicitly deviate from this.
For tooling around Profiles; we have seen that compiled schemas introduce complexity into both the HSDS Schema Tools and the HSDS Profile Wizard: HSDS Schema Tools literally has bugs because of it, and the HSDS Profile Wizard is way more complex than it needs to be because of it.
There are different paths needed for schema $id fields, so that compiled and un-compiled schemas don't have colliding identifiers and there is large amounts of complexity in updating the $ref values inside of openapi.json due to the fact that openapi.json references both compiled and uncompiled schemas.
Profile maintainers must manually refactor openapi.json to point away from compiled schemas, because some endpoints are dependant on the non-declarative logic of e.g. service_list.json. Else they must manually provide their own service_list.json and take steps to compile it, or they must make use of the HSDS Schema tools which has known bugs for Profile generation.
Conclusion: We don't need compiled schemas
I posit that we don't need compiled schemas. Compiled schemas aren't magical; there's no need for dereferencing if we've got a good BaseURI (we have) and we can structure list and/or package schemas in a way that isn't dependant on strange compilation behaviour.
In a future MAJOR upgrade; we should commit to do away with the notion of compiled schemas alltogether.
In the short/medium term, I propose the following things to lighten the maintenance and development burden for both HSDS and Profile developers/maintainers:
- Accept that the
openapi.json file cannot be modified without triggering governance. This means we can't simply change GET /services/ to return something other than a Page of https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/service_list.json
- But, we can stop generating the compiled services programmatically and start treating them like regular declarative schemas. In the case of compiled schemas which contain discrepencies or errors; we can refactor these and treat these as a PATCH fix.
tabular.json should simply be removed. It's woefully out of date, isn't the result of any automatic compilation as far as I can tell, and isn't used downstream by any docs or tooling. I think there's a solid argument that we can remove this as a bug via a PATCH update to the spec, alongside the other refactoring.
Thesis: the differentiation between "compiled" and "uncompiled" schemas is not useful, and adds complexity both to the specification and tooling built around it. We should do away with it in the next MAJOR upgrade of HSDS, and seriously consider trying to minimise its impact on the current
3.xseries.Compiled Schemas are not produced declaratively
We know from #565 that compiled schemas are not produced declaratively. They are the result of specific compilation logic buried inside of the openreferral/hsds_schema_tools.
The
organization.servicesarray is manually added for the compiled version oforganization.json, leading to confusion and issues with Profile generation:Further, the non-declarative compilation of
service_at_location.jsonresults in more errors for profile authors using hsds_schema_tools:This also obscures the function and rules governing the production of the various "package" and "list" schemas e.g.
service_list.jsonandservice_package.json.The openapi.json spec refers to a mixture of compiled and non-compiled schemas
The
openapi.jsonfile contains the following$refvalues/services/{id}: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/service.json"/services: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/service_list.json"/taxonomies/{id}and/taxonomies: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/taxonomy.json"/taxonomy_terms/{id}and/taxonomy_terms: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/taxonomy_term.json"/organizations/{id}: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/organization.json"/organizations: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/organization_list.json"/service_at_locations/{id}: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/service_at_location.json"/service_at_locations: "https://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/service_at_location_list.json"This means that each
taxonomy.jsonandtaxonomy_term.jsonare returned in non-compiled form; they are in fact not compiled at all. This is either a deliberate omission or an oversight; but it's unclear.This also adds a lot of complexity when trying to accommodate Profiles. Either the author must manually override each
$refto make it match the location of their Profile schemas, or the tooling must do this.hsds_schema_toolstries to circumvent the complexity but this results in bugs. It does a "find and replace" on the value for the base URI for version 3.0 of HSDS as it fetches the contents ofopenapi.json. This will fail for all versions of HSDS above 3.0, resulting in an erroneousopenapi.json(source)hsds-profile-wizardtries to tackle this head on, attempting to replacing$refvalues insideopenapi.jsonto match the profile schema$idvalues, but this results in by far the most complex function in the entire program since it needs to make a decision about whether something is compiled or not rather than simply looking for a$refand replacing its base URI. (source)There are errors and/or discrepencies in the compiled schemas
This relates to the fact they're not declarative. In particular, the "list" schemas returned by the API are modified in ways which result in them lacking properties in the base schemas. This is partly by design, but results in bugs and discrepencies.
When generating the list schemas (e.g.
service_list.json,organization_list.json), thehsds_schema_toolsscript removes all of the one-to-many properties from the list. From what context I can infer, this is possibly to support the API defining return types which don't result in infitely nested and recursive services, organizations, and service_at_location objects:However, this introduces discrepencies, and arguably errors. The model of
servicereturned byGET /service/{id}contains fields such asservice.schedules(an array ofschedule.json) andservice.additional_urls(an array ofurl.json). These are not present in theservice_list.jsonschemas and therefore are not declared to be part of the results ofGET /services.This means validation tooling will not be able to validate the contents of these fields for the results of
GET /services, and there is a discrepency between different versions of the service schema. The same is true for eachorganization_list.jsonandservice_at_location_list.json.Further investigation into the compilation step has revealed that differences between the
*_list.jsonand respective*_package.jsoncompiled schemas is that the list schemas have their one-to-many relationships removed, likely to avoid infinite nesting and/or circular$refresolution when validating data. This is handled inside of hsds_schema_tools on L510 (for service).A result of this, is that
compiled/service_list.jsonactually lacks fields added to the specification in recent MINOR versions e.g.service.additional_urls. This is present inservice.json,compiled/service.json, andcompiled/service_package.jsonbut NOTcompiled/service_list.json. This presents aa problem given thatcompiled/service_list.jsonis listed as a return type forGET /servicesin the OpenAPI definition.There are a number of compiled schemas which are never used elsewhere
The compiled schemas directory contains the following schemas, which are not returned or referenced in any other part of the specification:
organization_package.jsonservice_at_location_package.jsonservice_package.jsonservice_with_definitions.jsontabular.jsonThe schemas ending in
_package.jsonare just schemas which wrap an object inside an array. The contents are de-referenced, meaning all the definitions are compiled into a single file. E.g. theservice_package.jsonschemas is a schema where the top-level object is an array ofservice.json, where all the object definitions are de-referenced.tabular.jsonappears to be a single JSON schema attempting to replicate a tabular serialisation of the other schemas. The top-level object contains keys representing each of the HSDS schemas, and all the definitions are de-referenced. From a cursory glance, it looks like the models are not modified further to support relational modelling, but can make use of some keys such asservice.organization_idto model this.service_with_definitions.jsonis an interesting one, as it represents an explictly service-oriented view of HSDS, where the source schemas are all compiled into thedefinitionsarray of the JSON Schema files, and look like they've had various properties removed.Handling this complexity makes HSDS, Profiles, and the tooling surrounding it, difficult to maintain
The presence of compiled schemas makes adds complexity to HSDS and makes it more difficult to maintain, without any substantial benefit.
For those developing Profiles, they need to be aware of Compiled Schemas and the specific complexities of individual compiled schemas to understand how to derive what they want from their Profile. For example, they will need to make a provision for the
*_list.jsonschemas to maintain compatibility with upstream HSDS, or else explicitly deviate from this.For tooling around Profiles; we have seen that compiled schemas introduce complexity into both the HSDS Schema Tools and the HSDS Profile Wizard: HSDS Schema Tools literally has bugs because of it, and the HSDS Profile Wizard is way more complex than it needs to be because of it.
There are different paths needed for schema
$idfields, so that compiled and un-compiled schemas don't have colliding identifiers and there is large amounts of complexity in updating the$refvalues inside ofopenapi.jsondue to the fact thatopenapi.jsonreferences both compiled and uncompiled schemas.Profile maintainers must manually refactor
openapi.jsonto point away from compiled schemas, because some endpoints are dependant on the non-declarative logic of e.g.service_list.json. Else they must manually provide their ownservice_list.jsonand take steps to compile it, or they must make use of the HSDS Schema tools which has known bugs for Profile generation.Conclusion: We don't need compiled schemas
I posit that we don't need compiled schemas. Compiled schemas aren't magical; there's no need for dereferencing if we've got a good BaseURI (we have) and we can structure list and/or package schemas in a way that isn't dependant on strange compilation behaviour.
In a future MAJOR upgrade; we should commit to do away with the notion of compiled schemas alltogether.
In the short/medium term, I propose the following things to lighten the maintenance and development burden for both HSDS and Profile developers/maintainers:
openapi.jsonfile cannot be modified without triggering governance. This means we can't simply changeGET /services/to return something other than aPageofhttps://raw.githubusercontent.com/openreferral/specification/3.2/schema/compiled/service_list.jsontabular.jsonshould simply be removed. It's woefully out of date, isn't the result of any automatic compilation as far as I can tell, and isn't used downstream by any docs or tooling. I think there's a solid argument that we can remove this as a bug via a PATCH update to the spec, alongside the other refactoring.