Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create datapipe for resampled format #6

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[flake8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these more util=y kinda ones, especially since they are the same between this PR and the other one, I would make them its own PR and merge that instead, so this PR is just focused on the datapipe bit, and not the extra formatting bits.

max-line-length = 100
2 changes: 1 addition & 1 deletion .github/workflows/linters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ jobs:
call-run-python-linters:
uses: openclimatefix/.github/.github/workflows/python-lint.yml@main
with:
folder: "ocf_datapipes"
folder: "ukpn"
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,8 @@ dmypy.json

# Pyre type checker
.pyre/
tests/scripts/data
tests/data
pv-solar-farm-forecasting.code-workspace
.vscode
.gitattributes
59 changes: 59 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
default_language_version:
python: python3

ci:
skip: [pydocstyle, flake8]

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
# list of supported hooks: https://pre-commit.com/hooks.html
- id: trailing-whitespace
- id: end-of-file-fixer
- id: debug-statements
- id: detect-private-key

# python code formatting/linting
- repo: https://github.com/PyCQA/pydocstyle
rev: 6.1.1
hooks:
- id: pydocstyle
args:
[
--convention=google,
"--add-ignore=D200,D202,D210,D212,D415,D105",
"ukpn",
]
files: ^ukpn/
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
- id: flake8
args:
[
--max-line-length,
"100",
--extend-ignore=E203,
--per-file-ignores,
"__init__.py:F401",
"ukpn",
]
files: ^ukpn/
- repo: https://github.com/PyCQA/isort
rev: 5.11.4
hooks:
- id: isort
args: [--profile, black, --line-length, "100", "ukpn"]
- repo: https://github.com/psf/black
rev: 22.12.0
hooks:
- id: black
args: [--line-length, "100"]

# yaml formatting
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v3.0.0-alpha.4
hooks:
- id: prettier
types: [yaml]
Empty file added conftest.py
Empty file.
30 changes: 30 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: uk_pv_solar_farm_forecasting
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- pip
- pytorch
- rioxarray
- torchdata
- torchvision
- xarray
- fsspec
- zarr
- cartopy
- dask
- pyproj
- pyresample
- geopandas
- h5netcdf
- scipy
- pip:
- einops
- pathy
- git+https://github.com/SheffieldSolar/PV_Live-API
- pyaml_env
- nowcasting_datamodel
- gitpython
- tqdm
- bottleneck
20 changes: 20 additions & 0 deletions pydoc-markdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
loaders:
- type: python
search_path: [ukpn/]
processors:
- type: filter
- type: smart
renderer:
type: mkdocs
pages:
- title: Home
name: index
source: README.md
- title: API Documentation
children:
- title: Data
contents: [data]
mkdocs_config:
site_name: PV solar farm forecasting
theme: readthedocs
repo_url: https://github.com/openclimatefix/pv-solar-farm-forecasting
26 changes: 26 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
torch
torchdata
Cartopy>=0.20.3
xarray
zarr
fsspec
einops
numpy
pandas
rioxarray
pathy
pyaml_env
nowcasting_datamodel
gitpython
geopandas
dask
pvlib
jpeg_xl_float_with_nans
h5netcdf
tqdm
bottleneck
pyproj
pyresample
fastparquet
scipy
pytorch_lightning
22 changes: 22 additions & 0 deletions tests/scripts/test_download_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from ukpn.scripts import construct_url, get_metadata


def test_download_metadata():
cantubry_api_url = "https://ukpowernetworks.opendatasoft.com/api/records/1.0/search/?dataset=embedded-capacity-register&q=&facet=grid_supply_point&facet=licence_area&facet=energy_conversion_technology_1&facet=flexible_connection_yes_no&facet=connection_status&facet=primary_resource_type_group&refine.grid_supply_point=CANTERBURY+NORTH&refine.energy_conversion_technology_1=Photovoltaic"
download = get_metadata(api_url=cantubry_api_url, print_data=True)


def test_construct_url():
url = construct_url(
list_of_facets=[
"grid_supply_point",
"licence_area",
"energy_conversion_technology_1",
"flexible_connection_yes_no",
"connection_status",
"primary_resource_type_group",
],
refiners=["grid_supply_point", "energy_conversion_technology_1"],
refine_values=["CANTERBURY+NORTH", "Photovoltaic"],
)
search_url = get_metadata(api_url=url, print_data=True)
2 changes: 2 additions & 0 deletions ukpn/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""DataPipes"""
from ukpn import scripts
2 changes: 2 additions & 0 deletions ukpn/scripts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""Import Functions"""
from .download_data import construct_url, get_metadata
87 changes: 87 additions & 0 deletions ukpn/scripts/download_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""This class is ued to retrieve data through API calls"""
import json
import logging
from pprint import pprint

import requests

logger = logging.getLogger(__name__)


def get_metadata(api_url: str, print_data: bool = False):
"""
This function retrievs metadata through api calls

Args:
api_url: The api url link that emiits json format data
print_data: Optional to choose printing the data
"""

response_api = requests.get(api_url)
while True:
if response_api == 200:
logger.info(f"The api resposne {response_api} is successful")
else:
logger.warning(f"The api resposne {response_api} is unsuccessul")
logger.info(f"Please enter the correct {'url'}")
break

# Get the data from the resposne
raw_data = response_api.text

# Parse the data into json format
data_json = json.loads(raw_data)
data_first_record = data_json["records"][0]

if print_data:
pprint(data_first_record)


def construct_url(
dataset_name: str = "embedded-capacity-register",
list_of_facets=None,
refiners=None,
refine_values=None,
):
"""This function constructs a downloadble url of JSON data

For more information, please visit
- https://ukpowernetworks.opendatasoft.com/pages/home/

Args:
dataset_name: Name of the dataset that needs to be downloaded, defined by UKPN
list_of_facets: List of facets that needs to be included in the JSON data
refiners: list of refiner terms that needs to refined from the JSON data
refine_values: List of refine values of the refiners

Note:
refiners and refine values needs to be exactly mapped
"""
# Constructing a base url
base_url = "https://ukpowernetworks.opendatasoft.com/api/records/1.0/search/?dataset="
base_url = base_url + dataset_name

# A seperator in the url
seperator = "&"

# A questionare in the url
questionare = "q="

# A facet questionare in the url
facet_questionare = "facet="

# Constructing a facet string from the list of facets
facet_str = [facet_questionare + x for x in list_of_facets]
facet_str = seperator.join(facet_str)
facet_str = str(questionare + seperator + facet_str)

# Constructing a refiner string to refine the JSON data
refine_questionare = "refine."
refiners = [refine_questionare + x for x in refiners]
refiners = list(map(lambda x, y: x + str("=") + y, refiners, refine_values))
refiners = seperator.join(refiners)

# Constructing the final url
final_url = [base_url, facet_str, refiners]
final_url = seperator.join(final_url)
return final_url