-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add a notebook to get the list of all fields used by publicatio…
…ns in the Registry
- Loading branch information
Showing
4 changed files
with
523 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": [ | ||
"## Get the fields used by all OCDS publications in the Registry\n", | ||
"\n", | ||
"Use this notebook to get the list of the fields implemented by all the publishers in the Data Registry, for example, to check what publishers are publishing specific fields." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# @title Get all the publications from the registry { display-mode: \"form\" }\n", | ||
"\n", | ||
"publications = get_publications()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": [ | ||
"### Download all the publications, using the latest file available" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"today = datetime.now(tz=timezone.utc)\n", | ||
"results = []\n", | ||
"for publication in publications:\n", | ||
" if publication[\"date_to\"]:\n", | ||
" year = publication[\"date_to\"][:4]\n", | ||
" if int(year) > today.year:\n", | ||
" year = today.year\n", | ||
" else:\n", | ||
" year = \"full\"\n", | ||
" download_file(publication, year)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": [ | ||
"### Extract the list of fields using cardinal" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"final_dataset = pd.DataFrame()\n", | ||
"\n", | ||
"for file in os.listdir(\".\"):\n", | ||
" if file.endswith(\".jsonl\"):\n", | ||
" publisher = file.replace(\".jsonl\", \"\")\n", | ||
" coverage = !./ocdscardinal coverage $file\n", | ||
" data = (\n", | ||
" pd.DataFrame.from_dict(json.loads(coverage[0]), orient=\"index\", columns=[\"count\"])\n", | ||
" .reset_index()\n", | ||
" .rename(columns={\"index\": \"path\"})\n", | ||
" )\n", | ||
" data[\"publisher\"] = publisher\n", | ||
" final_dataset = pd.concat([final_dataset, data])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"final_dataset" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": [ | ||
"Export the results as CSV" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"final_dataset.to_csv(\"ocds_fields_from_all_publishers.csv\", index=False)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.