FHIR Analytics Annotation Algorithm

Bulk FHIR transformations to apply a standard set of annotations to the FHIR data models to better support the types of queries used in defining cohorts, calculating quality measures, and performing public health data.

Prototype implementation (unit tests).

Date, DateTime, Instant

Convert value to UTC
If partial dates, convert to start and end with sub-second precision. For example, '2018-05' will be populated with start date being '2018-05-01T00:00:00.000Z' and end date being '2018-05-31T23:59:59.999Z'. '2017-03-01' will be populated with start date being '2017-03-01T00:00:00.000Z' and end date being '2017-03-01T23:59:59.999Z'
Instant types should have the same start and end
Add to resource as {elementName}_aa.start and {elementName}_aa.end

FHIR Timing elements are ignored at present due to their limited use and the complexity involved in converting them into a date range.

String

Convert to uppercase
Follow unicode codepoint normalization guidelines (for JS: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize)
Add to resource as {elementName}_aa

Note: applied to text element in CodeableConcept and display element in Coding
TODO: review string normalization implementations in HAPI and MS FHIR servers

Resource Id

Build URL with base url, resourceType, and id
Remove scheme
Hash with SHA1
Update resource id to hash
Retain original id in id_prev_aa

TODO: is SHA1 the best hashing algorithm for this?

Narrative and Markdown

Omit from analytic dataset by default with option to include Narrative (optional inclusion not yet implemented in prototype)

Contained Resources

Build URL with base url, resourceType, parent resource id, and contained resource id
Remove URL scheme
Hash with SHA1
Update resource id
Retain original id in id_prev_aa
Extract from parent resource
Update internal references in former parent to new id

TODO: is SHA1 the best hashing algorithm for this?

Reference

If absolute URL, and base matches FHIR server base URL (ie not an external reference):

id = hash of url without scheme
If relative URL

id = hash of base url without scheme + relative url
If contained URL

id = hash of base url without scheme + relative url + "#" + relative id
Store previous Reference.reference as reference_prev_aa
Update Reference.reference to [resourceType]/[hashed id]
Populate Reference.type if not populated
Populate reference_id_aa with the hashed id

Extensions and Modifier Extensions

Flattened into a record structure for easy querying.

Note that BigQuery uses a typed schema and limits the number of fields in a table, so only FHIR types pre-defined for that extension path are included in the record.

Make all extensions URLs absolute
Flatten to the following table, replacing the extension element:
```
extension []
  parent (eg. 0.1.2)
  url (absolute)
  value[x]
```

Example queries:

	SELECT * 
		FROM Patient,
		UNNEST (extension) AS pt_extension
		WHERE pt_extension.url = "http://fhir.org/guides/argonaut/StructureDefinition/argo-race/ombCategory"
			AND pt_extension.valueCoding.system = "http://fhir.org/guides/argonaut/v3/Race"
			AND pt_extension.valueCoding.code = "1002-5"

	SELECT * 
		FROM Patient,
		UNNEST (extension) AS pt_extension
		WHERE pt_extension.url = "http://fhir.org/guides/argonaut/StructureDefinition/argo-race/text"
			AND pt_extension.valueString_aa LIKE "%MIXED%"

Recursive Structures

A few FHIR structures can be infinitely nested and need to be limited to fit in BigQuery and other schema based data stores.

Extensions can contain extensions (handled by flattening extensions as described above)
Extensions can contain complex types that can contain extensions (currently handled by omitting these)
Complex types can contain other complex types (eg. an Reference includes an Identifier has an Assigner which is a Reference). This is handled by limiting the levels of recursion to a pre-defined number of levels by path (prototype defaults to 3).
Context References that are circular (eg. Questionnaire.item.item). This is handled by limiting the levels of recursion to a pre-defined number of levels by path (prototype defaults to 3).

Address - not currently implemented in prototype

Normalize city, district, state and country by converting to uppercase and fixing abbreviations where possible
Optionally geocode

Add to Address type:

city_aa
district_aa
state_aa
country_aa
geocode_aa
  longitude Longitude with WGS84 datum
  latitude decimal Latitude with WGS84 datum
  altitude decimal Altitude with WGS84 datum

Extensions on Primitive Types - not currently implemented in prototype

If needed, primitive extensions could be flattened using a similar approach to that described above for other extensions.

Open Questions

Is it worth grouping similar choice types (eg. date, dateTime, instant) into one type for querying or should this be handled by the query generator checking for the existence of each type?
Support inclusion of timezone offset extension (https://www.hl7.org/fhir/extension-tz-offset.json.html) on date and time elements rather than assuming UTC?
Build schema for each upload and tailor to data or build general purpose schema (as is currently done)? Pro: would transparently support extensions and primitive extensions. Con: clients would have to check if fields exists prior to querying.
Is performance sufficient to do interval packing via window queries or does aggregation need to be done as a transformation on load (eg. merging multiple medication orders into a drug exposure "era")?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FHIR Analytics Annotation Algorithm

Date, DateTime, Instant

String

Resource Id

Narrative and Markdown

Contained Resources

Reference

Extensions and Modifier Extensions

Recursive Structures

Address - not currently implemented in prototype

Extensions on Primitive Types - not currently implemented in prototype

Open Questions

FilesExpand file tree

algorithm.md

Latest commit

History

algorithm.md

File metadata and controls

FHIR Analytics Annotation Algorithm

Date, DateTime, Instant

String

Resource Id

Narrative and Markdown

Contained Resources

Reference

Extensions and Modifier Extensions

Recursive Structures

Address - not currently implemented in prototype

Extensions on Primitive Types - not currently implemented in prototype

Open Questions