Skip to content

file listings

jvandegriff edited this page Mar 23, 2026 · 19 revisions

See also dataset schema issue tag.

File listings and events lists are similar types of data which can be represented in HAPI. For a while HAPI has been able to represent files, using the "stringType" metadata to identify strings as URIs. File listings were then colloquially just a time and a URI. But we would like to represent file listings as a specific schema in HAPI. This document explores this.

Events lists are more generally just a time stamp and a message associated with the time stamp. Often events lists will have a time range with a start and end time.

We start with a base class which is just a "String Listing":

  • time isotime
  • string

"File Listing" required elements:

  • time - start of time coverage (isotime, required)
  • fileURI (string, required, stringType used for base URI)

Optional and recommended elements:

  • modificationDate (isotime, recommended)
  • fileSize (recommended; however, if present, type is integer and units must be one of: "B", "KB", "KiB", "MB", "MiB", "GB", "GiB", "TB", "TiB", "PB", "PiB", "EB", "EiB")

Optional elements (if present, use these keywords):

  • checkSum (stringType used to constrain checkSumAlgorithm)
  • creationDate (isotime)
  • accessDate (isotime)
  • stopDate - (isotime) stop of time coverage

Ordering of "fileListing" columns:

  • required columns must be present in the order given (time, then fileURI)
  • optional columns must follow required columns and can be in any order
  • any number of user-added columns can be present (other than the listed optional columns) and these can be interleaved among the optional columns

Need a new stringType for checksum - see ticket #273 There are curated lists of hash algorithm names (for use in HTTP headers, for example):

A long would be helpful here, but that should be a separate discussion (and maybe we also add float too, for HAPI 4.0; also complex numbers?

Question (analysis needed): how many units-processing libraries use the same strings for these file size units?

Examples of standards for prefixes used with file sizes

We will eventually have to specify which standard we use for these prefixes.

Events Lists are also extensions of String Listings:

  • time - of time coverage (required,isotime)
  • stopDate - of time coverage (isotime, required) (Documentation acknowledges that this should be the same for an instant)
  • label (required)

Some example extensions to Event List:

  • latitude
  • longitude

Example proposed output, note x_parameterSchema

{
    "HAPI": "3.2",
    "x_createdAt": "2017-02-21T17:27Z",
    "modificationDate": "2026-01-01T00:00Z",
    "x_parameterSchema": "list>fileList>jpgFileList",
    "parameters": [
        {
            "length": 20,
            "name": "Time",
            "type": "isotime",
            "x_format": "$Y-$m-$dT$H:$M:$SZ",
            "fill": null,
            "units": "UTC",
            "timeStampLocation" : "begin"
        },
        {
            "description": "Picture of the creek, unmodified",
            "fill": null,
            "name": "fileURI",
            "length": 26,
            "type": "string",
            "units": null,
            "stringType": {
                "uri": {
                    "base": "https://cottagesystems.com/data/hapi/pics/",
                    "mediaType": "image/jpeg"
                }
            }
        },
        {
            "description": "File modification time",
            "name": "modificationDate",
            "type": "isotime",
            "fill": null,
            "x_format": "$Y-$m-$dT$H:$MZ",
            "length": 17,
            "units": "UTC"
        },
        {
            "description": "File size in kilobytes",
            "name": "fileSize",
            "fill": null,
            "type": "integer",
            "units": "KiB"
        }
    ],
    "sampleStartDate": "2023-01-01T00:00Z",
    "sampleStopDate": "2023-02-01T00:00Z",
    "startDate": "2022-11-01T00:00Z",
    "stopDate": "2026-03-06T00:00Z",
    "cadence": "PT10M",
    "status": {
        "code": 1200,
        "message": "OK"
    }
}

One issue is how to deal with the units on the file size. We could use IEEE units, which seem to be similar (the same?) as what is used in VO units, and astropy units, and probably also IEEE units: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9714443

See also:

Clone this wiki locally