Skip to content

Commit

Permalink
feat: add a notebook to get the list of all fields used by publicatio…
Browse files Browse the repository at this point in the history
…ns in the Registry
  • Loading branch information
yolile committed Dec 3, 2024
1 parent 600edac commit e03a807
Showing 4 changed files with 523 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -43,7 +43,7 @@ Notebook | Open in Colab | Description
[Relevant checks for all the Data Registry publications](https://github.com/open-contracting/notebooks-ocds/blob/main/template_relevant_checks_registry_all.ipynb) | [![Open Iinn Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/open-contracting/notebooks-ocds/blob/main/template_relevant_checks_registry_all.ipynb) | Provide feedback on data relevance downloading all the publications from the [Data Registry](https://data.open-contracting.org/).
[Red flags checks using the Data Registry](https://github.com/open-contracting/notebooks-ocds/blob/main/template_red_flags_checks_registry.ipynb) | [![Open Iinn Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/open-contracting/notebooks-ocds/blob/main/template_red_flags_checks_registry.ipynb) | Provide feedback on coverage for red flags using data from the [Data Registry](https://data.open-contracting.org/).
[Red flags checks using a field list](https://github.com/open-contracting/notebooks-ocds/blob/main/template_red_flags_checks_fieldlist.ipynb) | [![Open Iinn Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/open-contracting/notebooks-ocds/blob/main/template_red_flags_checks_fieldlist.ipynb) | Provide feedback on red flags for prospective OCDS publishers, using a field list, like from a field-level mapping.

[Field list for all the Data Registry publications](https://github.com/open-contracting/notebooks-ocds/blob/main/template_field_list_registry_all.ipynb) | [![Open Iinn Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/open-contracting/notebooks-ocds/blob/main/template_field_list_registry_all.ipynb) | Extract the fields published by all the the publications from the [Data Registry](https://data.open-contracting.org/).

## Contributing

141 changes: 141 additions & 0 deletions component_get_field_list_all_registry.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Get the fields used by all OCDS publications in the Registry\n",
"\n",
"Use this notebook to get the list of the fields implemented by all the publishers in the Data Registry, for example, to check what publishers are publishing specific fields."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# @title Get all the publications from the registry { display-mode: \"form\" }\n",
"\n",
"publications = get_publications()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Download all the publications, using the latest file available"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"today = datetime.now(tz=timezone.utc)\n",
"results = []\n",
"for publication in publications:\n",
" if publication[\"date_to\"]:\n",
" year = publication[\"date_to\"][:4]\n",
" if int(year) > today.year:\n",
" year = today.year\n",
" else:\n",
" year = \"full\"\n",
" download_file(publication, year)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Extract the list of fields using cardinal"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"final_dataset = pd.DataFrame()\n",
"\n",
"for file in os.listdir(\".\"):\n",
" if file.endswith(\".jsonl\"):\n",
" publisher = file.replace(\".jsonl\", \"\")\n",
" coverage = !./ocdscardinal coverage $file\n",
" data = (\n",
" pd.DataFrame.from_dict(json.loads(coverage[0]), orient=\"index\", columns=[\"count\"])\n",
" .reset_index()\n",
" .rename(columns={\"index\": \"path\"})\n",
" )\n",
" data[\"publisher\"] = publisher\n",
" final_dataset = pd.concat([final_dataset, data])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"final_dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Export the results as CSV"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"final_dataset.to_csv(\"ocds_fields_from_all_publishers.csv\", index=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
6 changes: 6 additions & 0 deletions manage.py
Original file line number Diff line number Diff line change
@@ -122,6 +122,12 @@
"component_setup_usability",
"component_check_red_flags_external",
],
"template_field_list_registry_all": [
"component_environment",
"component_setup_cardinal",
"component_setup_download_data_from_registry",
"component_get_field_list_all_registry",
],
}

BASEDIR = Path(__file__).resolve().parent
Loading

0 comments on commit e03a807

Please sign in to comment.