-
Notifications
You must be signed in to change notification settings - Fork 0
PowerQuery Functions
This page documents a number of PowerQuery functions that perform common data processing tasks for HPC data, and can be used in PowerBI or Excel.
This function takes a Service URL (e.g. https://api.hpc.tools) and list of location IDs, and produces a detailed table with all locations in HPC up to admin level3 (including deactivated locations), including the admin0, admin1, admin2 and admin3 IDs, names and Pcodes for all of a location's parents, and a name, id and pcode.
This is useful for example to lookup locations that are used in disaggregated caseloads or indicators throughout HPC.
(Service as text, locationIDs as list) => let
Data = List.Transform(locationIDs, (locationId) => Json.Document(Web.Contents(Service & "/v2/public/location/" & Number.ToText(locationId) & "?maxLevel=3&status=all"))[data]),
Countries = Table.FromList(Data, Splitter.SplitByNothing(), {"Admin0"}),
ProcessAdminLevel = (
baseTable as table,
adminLevel as number,
calculateName as function
) => let
namespace = "Admin" & Number.ToText(adminLevel),
// Add columns that will be used across all descendant admin levels
addId = Table.AddColumn(baseTable, namespace & "Id", each Record.Field(_, namespace)[id]),
addName = Table.AddColumn(addId, namespace & "Name", each Record.Field(_, namespace)[name]),
addPcode = Table.Buffer(Table.AddColumn(addName, namespace & "Pcode", each Record.Field(_, namespace)[pcode])),
// Add columns that are specifically for locations that are the current admin level
addFinalId = Table.AddColumn(addPcode, "id", each Record.Field(_, namespace)[id]),
addFinalName = Table.AddColumn(addFinalId, "name", calculateName),
addFinalPCode = Table.AddColumn(addFinalName, "pcode", each Record.Field(_, namespace)[pcode]),
addFinalAdminLevel = Table.AddColumn(addFinalPCode, "AdminLevel", each adminLevel),
// Used in next base
childrenColumn = "Admin" & Number.ToText(adminLevel + 1),
addChildren = Table.Buffer(Table.AddColumn(addPcode, childrenColumn, each Record.Field(_, namespace)[children])),
expandChildren = Table.SelectRows(Table.ExpandListColumn(addChildren, childrenColumn), each Record.Field(_, childrenColumn) <> null)
in
[
nextBase = expandChildren,
final = addFinalAdminLevel
],
Admin0 = ProcessAdminLevel(Countries, 0, each [Admin0][name]),
Admin1 = ProcessAdminLevel(Admin0[nextBase], 1, each [Admin0][name] & " -> " & [Admin1][name]),
Admin2 = ProcessAdminLevel(Admin1[nextBase], 2, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name]),
Admin3 = ProcessAdminLevel(Admin2[nextBase], 3, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name] & " -> " & [Admin3][name]),
Result = Table.Combine({
Admin0[final],
Admin1[final],
Admin2[final],
Admin3[final]
})
in
Result
This function accepts a list of ISO country codes, and will return all the locations that existin in HPC up to admin level 3 for the countries listed. This function makes use of the function GetLocationsById
(Service as text, locationISOs as list) => let
AllLocations = Json.Document(Web.Contents(Service & "/v2/public/location"))[data],
FilteredLocations = List.Select(AllLocations, each List.Contains(locationISOs, [iso3])),
FilteredLocationIDs = List.Transform(FilteredLocations, each [id])
in
GetLocationsById(Service, FilteredLocationIDs)
This function can take a caseload or indicator attachment JSON Record, and parse the disaggregated data, producing a table with a separate row for each (location, category, metric) combination.
It requires a table of locations (produced by either GetLocationsById or GetLocationsByISO) to use to lookup the location name and pcode based on ID.
(attachment as record, locations as table) => let
Values = attachment[attachmentVersion][value][metrics][values],
Metrics = Values[totals],
Disaggregated = if Record.HasFields(Values, "disaggregated") then Values[disaggregated] else null
in
if Disaggregated = null then null else (
let
Categories = Disaggregated[categories],
Locations =
if List.Count(Disaggregated[dataMatrix]) = List.Count(Disaggregated[locations]) + 1 then
Disaggregated[locations]
else
error Error.Record("Mismatch of rows and location length"),
RowIndexes = {0..(List.Count(Disaggregated[dataMatrix]) - 1)},
InitialTable = Table.FromList(RowIndexes, Splitter.SplitByNothing(), { "RowIndex" }),
AddRow = Table.AddColumn(InitialTable, "row", each Disaggregated[dataMatrix]{[RowIndex]}),
AddColIndex = Table.AddColumn(AddRow, "ColIndex", each {0..(List.Count([row]) - 1)}),
ExpandColIndex = Table.ExpandListColumn(AddColIndex, "ColIndex"),
AddCellValue = Table.AddColumn(ExpandColIndex, "Value", each [row]{[ColIndex]}),
FilterNullValues = Table.SelectRows(AddCellValue, each ([Value] <> null and [Value] <> 0 and [Value] <> "")),
AddMetricIndex = Table.AddColumn(FilterNullValues, "MetricIndex", each Number.Mod([ColIndex], List.Count(Metrics))),
AddCategoryIndex = Table.AddColumn(AddMetricIndex, "CategoryIndex", each Number.RoundDown([ColIndex] / List.Count(Metrics))),
AddLocationRef = Table.AddColumn(AddCategoryIndex, "LocationRef", each Locations{[RowIndex] - 1}),
AddCategory = Table.AddColumn(AddLocationRef, "Category", each if [CategoryIndex] < List.Count(Categories) then Categories{[CategoryIndex]}[label] else "Total"),
AddMetric = Table.AddColumn(AddCategory, "Metric", each Metrics{[MetricIndex]}),
AddLocationId = Table.AddColumn(AddMetric, "LocationId", each [LocationRef][id]),
AddLocation = Table.AddColumn(AddLocationId, "Location", each Table.SelectRows(locations, (loc) => loc[id] = [LocationId]){0}),
AddLocationName = Table.AddColumn(AddLocation, "LocationName", each try [Location][name] otherwise "Broken Location: " & [LocationRef][name] ),
AddLocationPcode = Table.AddColumn(AddLocationName, "LocationPcode", each try [Location][pcode] otherwise "Broken Location: " & [LocationRef][name]),
AddLocationAdminLevel = Table.AddColumn(AddLocationPcode, "LocationAdminLevel", each try [Location][AdminLevel] otherwise "Broken Location: " & [LocationRef][name]),
AddMetricType = Table.AddColumn(AddLocationAdminLevel, "MetricType", each [Metric][type]),
AddMetricName = Table.AddColumn(AddMetricType, "MetricName", each [Metric][name][en]),
Result = Table.SelectColumns(AddMetricName, {"Location", "LocationName", "LocationPcode", "LocationAdminLevel", "Category", "MetricType", "MetricName", "Value"})
in
Result
)
The HPC API makes use of pagination for many of its endpoints (such as searching for flows or projects). What this means is that to get a complete set of certain parts of our data, you need to make multiple requests to our API to get the individual pages of data.
This is quite a common practice in API design to prevent requests becoming too large for unbounded data-sets, or prevent requests taking too long to serve.
You can create queries in PowerQuery that dynamically make the appropriate number of requests to get a complete data-set by using recursion.
We use this utility function to help us create such queries:
(Get as function, args) => let
data = Get(args),
nextResults = if data[nextArgs] <> null then @GetPaginatedData(Get, data[nextArgs]) else {}
in
List.Combine({data[items], nextResults})
The first argument (Get) needs to be a function that:
- accepts an argument that provides enough data for the function to craft an API request for a single page (e.g. the URL, or page ID)
- returns a
Recordwith the properties:-
items: aListof the items for the requested page -
nextArgs: either:-
null(when the current page is the last page, and no more requests are needed) - a value that will be passed into the
Getfunction to get the next page
-
-
The second argument (args) needs to be a value that represents the first page
you would like to fetch, and is passed to Get for the first request.
Note: when fetching data like this, you ALMOST ALWAYS want to use List.Buffer
to prevent making an unnecessary number of queries to the server.
(see examples below)
This function makes use of GetPaginatedData to get all of the flows for a
particular query.
Note the use of limit=1000 to increase the number of results per page
(1000 is the maximum for this endpoint).
This will reduce the number requests that need to be made overall.
(query as text) => let
GetFlowPage = (url as text) => let
data = Binary.Buffer(Web.Contents(url)),
json = Json.Document(data)
in
[
items = json[data][flows],
nextArgs = if Record.HasFields(json[meta], "nextLink") then json[meta][nextLink] else null
]
in
List.Buffer(GetPaginatedData(
GetFlowPage,
"https://api.hpc.tools/v1/public/fts/flow?limit=1000&" & query
))
Notes:
- The
queryargument takes the parameters that you would put after the?in the URL. You can add multiple arguments using&as normal. - You may be surprised that we pass in a URL rather than a page ID for the arguments of this function. The reason we do this is because the API handily provides the URL that we need to use to get the next page of data, so we can use that instead, and simply provide the first URL of the sequence.
So for example, to get all flows for 2019, you would use it like so:
GetAllFlows("year=2019")
We have recently announced a number of upcoming changes to the HPC API,
including the development of v4 of the API that will use GraphQL at its core.
These changes are at the very early planning and development stages at the
moment, and are not yet ready to be tested or used, so for the time being we
recommend continuing to use the v2 endpoints that are documented in the
Swagger documentation at https://api.hpc.tools/docs/v2/ and in this wiki.
If you'd like to keep updated as to when these new endpoints will become available for early user feedback or use, please see the section below.
We have introduced some mailing lists that we will now be using to keep API users up-to-date, and engage with users for early testing and feedback of new API features.
If you would like to join either of these mailing lists, please send an email to [email protected] with your request:
-
This mailing list will be used for official announcements from the HPC.tools team regarding changes in the API, including new features and deprecation notices.
-
This mailing list allows for open discussion between different users of the HPC API, as well as the development team. Feel free to use this mailing list to ask for advice on how to use the API.
We will be using this mailing list to solicit early feedback for the
v4API as we continue to plan and develop it.