Skip to content

Add xarray.Datatree() guidance for ICESat-2 ATL06 tutorial #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 36 additions & 22 deletions notebooks/ICESat-2_Cloud_Access/ATL06-direct-access.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,18 @@
"\n",
"## **1. Tutorial Overview**\n",
"\n",
"**Note: This is an updated version of the notebook that was presented to the NSIDC DAAC User Working Group in May 2022**\n",
"\n",
"This notebook demonstrates searching for cloud-hosted ICESat-2 data and directly accessing Land Ice Height (ATL06) granules from an Amazon Compute Cloud (EC2) instance using the `earthaccess` package. NASA data \"in the cloud\" are stored in Amazon Web Services (AWS) Simple Storage Service (S3) Buckets. **Direct Access** is an efficient way to work with data stored in an S3 Bucket when you are working in the cloud. Cloud-hosted granules can be opened and loaded into memory without the need to download them first. This allows you take advantage of the scalability and power of cloud computing. \n",
"\n",
"The Amazon Global cloud is divided into geographical regions. To have direct access to data stored in a region, our compute instance - a virtual computer that we create to perform processing operations in place of using our own desktop or laptop - must be in the same region as the data. This is a fundamental concept of _analysis in place_. **NASA cloud-hosted data is in Amazon Region us-west2. So your compute instance must also be in us-west2.** If we wanted to use data stored in another region, to use direct access for that data, we would start a compute instance in that region.\n",
"\n",
"As an example data collection, we use ICESat-2 Land Ice Height (ATL06) over the Juneau Icefield, AK, for March 2003. ICESat-2 data granules, including ATL06, are stored in HDF5 format. We demonstrate how to open an HDF5 granule and access data variables using `xarray`. Land Ice Heights are then plotted using `hvplot`. \n",
"\n",
"`earthaccess` is a package developed by Luis Lopez (NSIDC developer) to allow easy search of the NASA Common Metadata Repository (CMR) and download of NASA data collections. It can be used for programmatic search and access for both _DAAC-hosted_ and _cloud-hosted_ data. It manages authenticating using Earthdata Login credentials which are then used to obtain the S3 tokens that are needed for S3 direct access. https://github.com/nsidc/earthaccess\n",
"`earthaccess` is a community-developed Python package developed by Luis Lopez (NSIDC developer) to allow easy search of the NASA Common Metadata Repository (CMR) and download of NASA data collections. It can be used for programmatic search and access for both _on-premises-hosted_ and _cloud-hosted_ data. It manages authenticating using Earthdata Login credentials which are then used to obtain the S3 tokens that are needed for S3 direct access. https://github.com/nsidc/earthaccess\n",
"\n",
"\n",
"### **Credits**\n",
"\n",
"The notebook was created by Andy Barrett, NSIDC, updated by Jennifer Roebuck, NSIDC, and is based on notebooks developed by Luis Lopez and Mikala Beig, NSIDC.\n",
"The notebook was created by Andy Barrett, NSIDC, updated by Jennifer Roebuck and Amy Steiker, NSIDC, and is based on notebooks developed by Luis Lopez and Mikala Beig, NSIDC.\n",
"\n",
"For questions regarding the notebook, or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues).\n",
"\n",
Expand All @@ -39,7 +37,7 @@
"By the end of this demonstration you will be able to: \n",
"1. use `earthaccess` to search for ICESat-2 data using spatial and temporal filters and explore search results; \n",
"2. open data granules using direct access to the ICESat-2 S3 bucket; \n",
"3. load a HDF5 group into an `xarray.Dataset`; \n",
"3. load HDF5 data into an `xarray.Datatree` object;\n",
"4. visualize the land ice heights using `hvplot`. \n",
"\n",
"### **Prerequisites**\n",
Expand Down Expand Up @@ -158,7 +156,7 @@
"id": "d3957627",
"metadata": {},
"source": [
"In this case there are 65 collections that have the keyword ICESat-2.\n",
"Several dozen collections with the keyword ICESat-2 are returned in the Query object.\n",
"\n",
"The `search_datasets` method returns a python list of `DataCollection` objects. We can view the metadata for each collection in long form by passing a `DataCollection` object to print or as a summary using the `summary` method. We can also use the `pprint` function to Pretty Print each object.\n",
"\n",
Expand Down Expand Up @@ -267,9 +265,9 @@
"source": [
"To display the rendered metadata, including the download link, granule size and two images, we will use `display`. In the example below, all 4 results are shown. \n",
"\n",
"The download link is `https` and can be used download the granule to your local machine. This is similar to downloading _DAAC-hosted_ data but in this case the data are coming from the Earthdata Cloud. For NASA data in the Earthdata Cloud, there is no charge to the user for egress from AWS Cloud servers. This is not the case for other data in the cloud.\n",
"The download link is `https` and can be used download the granule to your local machine. This is similar to downloading data located _on-premises_ but in this case the data are coming from the Earthdata Cloud. For NASA data in the Earthdata Cloud, there is no charge to the user for egress from AWS Cloud servers. This may not be the case for other data in the cloud.\n",
"\n",
"Note the `[None, None, None, None]` that is displayed at the end can be ignored, it has no meaning in relation to the metadata."
"Note the `[None, None, None, None]` that is displayed at the end can be ignored; it has no meaning in relation to the metadata."
]
},
{
Expand All @@ -291,40 +289,56 @@
"source": [
"## Use Direct-Access to open, load and display data stored on S3\n",
"\n",
"Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the `open` method. The `auth` object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials.\n",
"\n",
"The next step is to load the data. In this case, data are loaded into an `xarray.Dataset`. Data could be read into `numpy` arrays or a `pandas.Dataframe`. However, each granule would have to be read using a package that reads HDF5 granules such as `h5py`. `xarray` does this all _under-the-hood_ in a single line but for a single group in the HDF5 granule*.\n",
"\n",
"*ICESat-2 measures photon returns from 3 beam pairs numbered 1, 2 and 3 that each consist of a left and a right beam. In this case, we are interested in the left ground track (gt) of beam pair 1. "
"Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the `open` method. The `auth` object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11205bbb",
"id": "e50bf87d-1c83-42c7-b645-3948b15b7675",
"metadata": {},
"outputs": [],
"source": [
"files = earthaccess.open(results)\n",
"ds = xr.open_dataset(files[1], group='/gt1l/land_ice_segments')"
"files = earthaccess.open(results)"
]
},
{
"cell_type": "markdown",
"id": "cecdf984-ce9f-41c0-946b-3a0fa1ce40bc",
"metadata": {},
"source": [
"The next step is to load the data. `xarray.DataTree` objects allow us to work with hierarchical data structures and file formats such as HDF5, Zarr and NetCDF4 with groups. \n",
"\n",
"We use `xr.open_datatree` to open the ATL06 data. We add the `phony_dims=\"sort\"` option because data variables in several groups including `ancillary_data` do not have any assigned dimension scales. `xarray` names dimensions `phony_dim0`, `phony_dim1`, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75881751",
"id": "013136a2-80fd-4625-aca8-1a64775c9593",
"metadata": {},
"outputs": [],
"source": [
"ds"
"dt = xr.open_datatree(files[1], phony_dims='sort')\n",
"dt"
]
},
{
"cell_type": "markdown",
"id": "56677586-a2cd-4ddb-8e57-457ced954331",
"metadata": {},
"source": [
"We can see from the representation of the `xarray.DataTree` object `dt` that there are ten groups in the top, or \"root\", level. Clicking on Groups reveals various Metadata and Ancillary data groups as well as groups representing each of the left and right beam pairs from the ICESat-2 ATLAS instrument*. We can also see that there are no dimensions, coordinates, or data variables in the root group. Reading the data into `numpy` arrays or a `pandas.Dataframe` could be an alternative method to using `xarray.Datatree`. However, each granule (file) would have to be read first using a package that reads HDF5 files such as `h5py`. `xarray` does this all under-the-hood in a single line.\n",
"\n",
"*ICESat-2 measures photon returns from 3 beam pairs numbered 1, 2 and 3 that each consist of a left and a right beam. In this case, we are interested in plotting the left ground track (gt) of beam pair 1. "
]
},
{
"cell_type": "markdown",
"id": "1282ce34",
"metadata": {},
"source": [
"`hvplot` is an interactive plotting tool that is useful for exploring data."
"`hvplot` is an interactive plotting tool that is useful for exploring data:"
]
},
{
Expand All @@ -334,7 +348,7 @@
"metadata": {},
"outputs": [],
"source": [
"ds['h_li'].hvplot(kind='scatter', s=2)"
"dt['/gt1l/land_ice_segments/h_li'].hvplot(kind='scatter', s=2)"
]
},
{
Expand All @@ -347,7 +361,7 @@
"We have learned how to:\n",
"1. use `earthaccess` to search for ICESat-2 data using spatial and temporal filters and explore search results;\n",
"2. open data granules using direct access to the ICESat-2 S3 bucket;\n",
"3. load a HDF5 group into an xarray.Dataset;\n",
"3. load a HDF5 group into an xarray.Datatree;\n",
"4. visualize the land ice heights using hvplot."
]
},
Expand Down Expand Up @@ -394,7 +408,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.11.11"
}
},
"nbformat": 4,
Expand Down
Loading
Loading