-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
1,099 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
Snippets | ||
======== | ||
|
||
Notes and writeups of handy description areas, that don't yet have a home. | ||
|
||
Data component (NameMap) dictionaries | ||
------------------------------------- | ||
For all of these properties, dictionary-style behaviour means that its ``.keys()`` | ||
is a sequence of the content names, and ``.values()`` is a sequence of the contained | ||
objects. | ||
|
||
|
||
NcData | ||
------ | ||
The :class:`~ncdata.NcData` class represents either a dataset or group, | ||
the structures of these are identical. | ||
|
||
NcAttributes | ||
------------ | ||
attributes are stored as NcAttribute objects, rather than simple name: value maps. | ||
thus an 'attribute' of a NcVariable or NcData is an attribute object, not a value. | ||
|
||
Thus: | ||
|
||
>>> variable.attributes['x'] | ||
NcAttribute('x', [1., 2., 7.]) | ||
|
||
The attribute has a ``.value`` property, but it is most usefully accessed with the | ||
:meth:`~ncdata.NcAttribute.as_python_value()` method : | ||
|
||
>>> attr = NcAttribute('b', [1.]) | ||
>>> attr.value | ||
array([1.]) | ||
>>> attr.as_python_value() | ||
array(1.) | ||
|
||
>>> attr = NcAttribute('a', "this") | ||
>>> attr.value | ||
array('this', dtype='<U4') | ||
>>> attr.as_python_value() | ||
'this' | ||
|
||
From within a parent object's ``.attributes`` dictionary, | ||
|
||
|
||
Component Dictionaries | ||
---------------------- | ||
ordering | ||
- insert, remove, rename effects | ||
re-ordering | ||
|
||
|
||
As described :ref:`above <howto_access>`, sub-components are stored under their names | ||
in a dictionary container. | ||
|
||
Since all components have a name, and are stored by name in the parent property | ||
dictionary (e.g. ``variable.attributes`` or ``data.dimensions``), the component | ||
dictionaries have an :meth:`~ncdata.NameMap.add` method, which works with the component | ||
name. | ||
supported operations | ||
^^^^^^^^^^^^^^^^^^^^ | ||
standard dict methods : del, getitem, setitem, clear, append, extend | ||
extra methods : add, addall | ||
|
||
ordering | ||
^^^^^^^^ | ||
For Python dictionaries in general, | ||
since `announced in Python 3.7 <https://mail.python.org/pipermail/python-dev/2017-December/151283.html>`_, | ||
the order of the entries is now a significant and stable feature of Python dictionaries. | ||
There | ||
Also as for Python dictionaries generally, there is no particular assistance for | ||
managing or using the order. The following may give some indication: | ||
|
||
extract 'n'th item: ``data.variables[list(elelments.keys())[n]]`` | ||
sort the list: | ||
# get all the contents, sorted by name | ||
content = list(data.attributes.values()) | ||
content = sorted(content, key= lambda v: v.name) | ||
# clear the container -- necessary to forget the old ordering | ||
data.attributes.clear() | ||
# add all back in, in the new order | ||
data.attributes.addall(content) | ||
|
||
New entries are added last, and renamed entries retain their | ||
|
||
The :meth:`~ncdata.utils/dataset_differences` method reports differences in the | ||
ordering of components (unless turned off). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,274 @@ | ||
Core Data Objects | ||
================= | ||
Ncdata uses Python objects to represent netCDF data, and allows the user to freely | ||
inspect and/or modify it, aiming to do this is the most natural and pythonic way. | ||
|
||
.. _data-model: | ||
|
||
Data Classes | ||
------------ | ||
The data model components are elements of the | ||
`NetCDF Classic Data Model`_ , plus **groups** (from the 'enhanced' netCDF model). | ||
|
||
That is, a Dataset(File) consists of just Dimensions, Variables, Attributes and | ||
Groups. | ||
|
||
.. note:: | ||
We are not, as yet, explicitly supporting the NetCDF4 extensions to variable-length | ||
and user types. See : :ref:`data-types` | ||
|
||
The core ncdata classes representing these Data Model components are | ||
:class:`~ncdata.NcData`, :class:`~ncdata.NcDimension`, :class:`~ncdata.NcVariable` and | ||
:class:`~ncdata.NcAttribute`. | ||
|
||
Notes : | ||
|
||
* There is no "NcGroup" class : :class:`~ncdata.NcData` is used for both the "group" and | ||
"dataset" (aka file). | ||
|
||
* All data objects have a ``.name`` property, but this can be empty (``None``) when it is not | ||
contained in a parent object as a component. See :ref:`components-and-containers`, | ||
below. | ||
|
||
|
||
:class:`~ncdata.NcData` | ||
^^^^^^^^^^^^^^^^^^^^^^^ | ||
This represents a dataset containing variables, attributes and groups. | ||
It is also used to represent groups. | ||
|
||
:class:`~ncdata.NcDimension` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
This represents a dimension, defined in terms of name, length, and whether "unlimited" | ||
(or not). | ||
|
||
:class:`~ncdata.NcVariable` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Represents a data variable, with dimensions and, optionally, data and attributes. | ||
|
||
Note that ``.dimensions`` is simply a list of names (strings) : they are not | ||
:class:`~ncdata.NcDimension` objects, and not linked to actual dimensions of the | ||
dataset, so *actual* dimensions are only identified dynamically, when they need to be. | ||
|
||
Variables can be created with either real (numpy) or lazy (dask) arrays, or no data at | ||
all. | ||
|
||
A variable has a ``.dtype``, which may be set if creating with no data. | ||
However, at present, after creation ``.data`` and ``.dtype`` can be reassigned and there | ||
is no further checking of any sort. | ||
|
||
.. _variable-dtypes: | ||
|
||
Variable Data Arrays | ||
"""""""""""""""""""" | ||
When a variable does have a ``.data`` property, this will be an array, with at least | ||
the usual ``shape``, ``dtype`` and ``__getitem__`` properties. In practice we assume | ||
for now that we will always have real (numpy) or lazy (dask) arrays. | ||
|
||
When data is exchanged with an actual file, it is simply written if real, and streamed | ||
(via :meth:`dask.array.store`) if lazy. | ||
|
||
When data is exchanged with supported data analysis packages (i.e. Iris or Xarray, so | ||
far), these arrays are transferred directly without copying or making duplicates (such | ||
as numpy views). | ||
This is a core principle (see :ref:`design-principles`), but may require special support in | ||
those packages. | ||
|
||
See also : :ref:`data-types` | ||
|
||
:class:`~ncdata.NcAttribute` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Represents an attribute, with name and value. The value is always either a scalar | ||
or a 1-D numpy array -- this is enforced as a computed property (read and write). | ||
|
||
.. _attribute-dtypes: | ||
|
||
Attribute Values | ||
"""""""""""""""" | ||
In actual netCDF data, the value of an attribute is effectively limited to a one-dimensional | ||
array of certain valid netCDF types, and one-element arrays are exactly equivalent to scalar values. | ||
|
||
In ncdata, the ``.value`` of an :class:`ncdata.NcAttribute` must always be a numpy array, and | ||
when creating one the provided ``.value`` is cast with :func:`numpy.asanyarray`. | ||
|
||
However you are not prevented from setting an attributes ``.value`` to something other than | ||
an array, which may cause an error. So for now, if setting the value of an existing attribute, | ||
ensure you always write compatible numpy data, or use :meth:`ncdata.NameMap.set_attrval` which is safe. | ||
|
||
For *reading* attributes, it is best to use :meth:`ncdata.NameMap.get_attrval` or (equivalently) | ||
:meth:`ncdata.NcAttribute.as_python_value()` : These consistently return either | ||
``None`` (if missing); a numpy scalar; or array; or a Python string. Those results are | ||
intended to be equivalent to what you should get from storing in an actual file and reading back, | ||
including re-interpreting a length-one vector as a scalar value. | ||
|
||
.. attention:: | ||
The correct handling and (future) discrimination of string data as character arrays ("char" in netCDF terms) | ||
and/or variable-length strings ("string" type) is still to be determined. | ||
|
||
For now, we are converting **all** string attributes to python strings. | ||
|
||
There is **also** a longstanding known problem with the low-level C (and FORTRAN) interface, which forbids the | ||
creation of vector character attributes, which appear as single concatenated strings. So for now, **all** | ||
string-type attributes appear as single Python strings (you never get an array of strings or list of strings). | ||
|
||
See also : :ref:`data-types` | ||
|
||
.. _correctness-checks: | ||
|
||
Correctness and Consistency | ||
--------------------------- | ||
In practice, to support flexibility in construction and manipulation, it is | ||
not practical for ncdata structures to represent valid netCDF at | ||
all times, since this would makes changing things awkward. | ||
For example, if a group refers to a dimension *outside* the group, you could not simply | ||
extract it from the dataset because it is not valid in isolation. | ||
|
||
Thus, we do allow that ncdata structures represent *invalid* netCDF data. | ||
For example, circular references, missing dimensions or naming mismatches. | ||
Effectively there are a set of data validity rules, which are summarised in the | ||
:func:`ncdata.utils.save_errors` routine. | ||
|
||
In practice, there a minimal set of runtime rules for creating ncdata objects, and | ||
additional requirements when ncdata is converted to actual netCDF. For example, | ||
variables can be initially created with no data. But if subsequently written to a file, | ||
data must be assigned first. | ||
|
||
.. Note:: | ||
These issues are not necessarily all fully resolved. Caution required ! | ||
|
||
.. _components-and-containers: | ||
|
||
Components, Containers and Names | ||
-------------------------------- | ||
Each dimension, variable, attribute or group normally exists as a component in a | ||
parent dataset (or group), where it is stored in a "container" property of the parent, | ||
i.e. either its ``.dimensions``, ``.variables``, ``.attributes`` or ``.groups``. | ||
|
||
Each of the "container" properties is a :class:`~ncdata._core.NameMap` object, which | ||
is a dictionary type mapping a string (name) to a specific type of components. | ||
The dictionary``.keys()`` are a sequence of component names, and its ``.values()`` are | ||
the corresponding contained components. | ||
|
||
Every component object also has a ``.name`` property. By this, it is implicit that you | ||
**could** have a difference between the name by which the object is indexed in its | ||
container, and its ``.name``. This is to be avoided ! | ||
|
||
The :meth:`~ncdata.NameMap` container class is provided with convenience methods which | ||
aim to make this easier, such as :meth:`~ncdata.NameMap.add` and | ||
:meth:`~ncdata.NameMap.rename`. | ||
|
||
NcData and NcVariable ".attributes" components | ||
---------------------------------------------- | ||
Note that the contents of a ".attributes" are :class:`~ncdata.NcAttributes` objects, | ||
not attribute values. | ||
|
||
Thus to fetch an attribute you might write, for example one of these : | ||
|
||
.. code-block:: | ||
units1 = dataset.variables['var1'].get_attrval('units') | ||
units1 = dataset.variables['var1'].attributes['units'].as_python_value() | ||
but **not** ``unit = dataset.variables['x'].attributes['attr1']`` | ||
|
||
And not ``unit = dataset.variables['x'].attributes['attr1']`` | ||
|
||
Or, likewise, to ***set*** values, one of | ||
|
||
.. code-block:: | ||
dataset.variables['var1'].set_attrval('units', "K") | ||
dataset.variables['var1'].attributes['units'] = NcAttribute("units", K) | ||
but **not** ``dataset.variables['x'].attributes['units'].value = "K"`` | ||
|
||
|
||
Container ordering | ||
------------------ | ||
The order of elements of a container is technically significant, and does constitute a | ||
potential difference between datasets (or files). | ||
|
||
The :meth:`ncdata.NameMap.rename` method preserves the order of an element, | ||
while :meth:`ncdata.NameMap.add` adds the new components at the end. | ||
|
||
The :func:`ncdata.utils.dataset_differences` utility provides various keywords allowing | ||
you to ignore ordering in comparisons, when required. | ||
|
||
|
||
Container methods | ||
----------------- | ||
The :class:`~ncdata.NameMap` class also provides a variety of manipulation methods, | ||
both normal dictionary operations and some extra ones. | ||
|
||
The most notable ones are : ``del``, ``pop``, ``add``, ``addall``, ``rename`` and of | ||
course ``__setitem__`` . | ||
|
||
See :ref:`common_operations` section. | ||
|
||
.. _data-constructors: | ||
|
||
Core Object Constructors | ||
------------------------ | ||
The ``__init__`` methods of the core classes are designed to make in-line definition of | ||
new objects in user code reasonably legible. So, when initialising one of the container | ||
properties, the keyword/args defining component parts use the utility method | ||
:meth:`ncdata.NameMap.from_items` so that you can specify a group of components in a variety of ways : | ||
either a pre-created container or a similar dictionary-like object : | ||
|
||
.. code-block:: python | ||
>>> ds1 = NcData(groups={ | ||
... 'x':NcData('x'), | ||
... 'y':NcData('y') | ||
... }) | ||
>>> print(ds1) | ||
<NcData: <'no-name'> | ||
groups: | ||
<NcData: x | ||
> | ||
<NcData: y | ||
> | ||
> | ||
or **more usefully**, just a *list* of suitable data objects, like this... | ||
|
||
.. code-block:: python | ||
>>> ds2 = NcData( | ||
... variables=[ | ||
... NcVariable('v1', ('x',), data=[1,2]), | ||
... NcVariable('v2', ('x',), data=[2,3]) | ||
... ] | ||
... ) | ||
>>> print(ds2) | ||
<NcData: <'no-name'> | ||
variables: | ||
<NcVariable(int64): v1(x)> | ||
<NcVariable(int64): v2(x)> | ||
> | ||
Or, in the **special case of attributes**, a regular dictionary of ``name: value`` form | ||
will be automatically converted to a NameMap of ``name: NcAttribute(name: value)`` : | ||
|
||
.. code-block:: python | ||
>>> var = NcVariable( | ||
... 'v3', | ||
... attributes={'x':'this', 'b':1.4, 'arr': [1, 2, 3]} | ||
... ) | ||
>>> print(var) | ||
<NcVariable(<no-dtype>): v3() | ||
v3:x = 'this' | ||
v3:b = 1.4, | ||
v3:arr = array([1, 2, 3]) | ||
> | ||
Relationship to File Storage | ||
---------------------------- | ||
Note that file-specific storage aspects, such as chunking, data-paths or compression | ||
strategies, are not recorded in the core objects. However, array representations in | ||
variable and attribute data (notably dask lazy arrays) may hold such information. | ||
The concept of "unlimited" dimensions is arguably an exception. However, this is a | ||
core provision in the NetCDF data model itself (see "Dimension" in the `NetCDF Classic Data Model`_). | ||
|
||
.. _NetCDF Classic Data Model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html#classic_model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.