From df745a57b62e39bd1a67a06244d57d5d34e68689 Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Tue, 11 Jul 2023 02:48:10 -0600 Subject: [PATCH 01/20] adding the links to all indexing materials --- workshops/scipy2023/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/workshops/scipy2023/README.md b/workshops/scipy2023/README.md index 552c5c9f..cd0868be 100644 --- a/workshops/scipy2023/README.md +++ b/workshops/scipy2023/README.md @@ -63,7 +63,11 @@ Once your codespace is launched, the following happens: ``` ```{dropdown} Indexing +{doc}`../../fundamentals/02.1_indexing_Basic` + {doc}`../../intermediate/indexing/advanced-indexing` + +{doc}`../../intermediate/indexing/boolean-masking-indexing` ``` ```{dropdown} Computational Patterns From 155c810ba5568b4d566b96da2c66bf9daf067b2c Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Tue, 11 Jul 2023 02:55:33 -0600 Subject: [PATCH 02/20] typo fix + remove a redundant example. --- intermediate/indexing/advanced-indexing.ipynb | 51 ++++++++----------- 1 file changed, 20 insertions(+), 31 deletions(-) diff --git a/intermediate/indexing/advanced-indexing.ipynb b/intermediate/indexing/advanced-indexing.ipynb index 804af21c..efaf6556 100644 --- a/intermediate/indexing/advanced-indexing.ipynb +++ b/intermediate/indexing/advanced-indexing.ipynb @@ -17,7 +17,7 @@ "source": [ "## Overview\n", "\n", - "In the previous notebooks, we learned basic forms of indexing with xarray (positional and name based dimensions, integer and label based indexing), Datetime Indexing, and nearest neighbor lookups. In this tutorial, we will lean how Xarray indexing is different from Numpy and how to do vectorized/pointwise indexing using Xarray. \n", + "In the previous notebooks, we learned basic forms of indexing with xarray (positional and name based dimensions, integer and label based indexing), Datetime Indexing, and nearest neighbor lookups. In this tutorial, we will learn how Xarray indexing is different from Numpy and how to do vectorized/pointwise indexing using Xarray. \n", "First, let's import packages needed for this repository: " ] }, @@ -107,6 +107,18 @@ "da.sel(lat=target_lat, lon=target_lon, method=\"nearest\") # -- orthogonal indexing" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "target_lat = xr.DataArray([31, 41, 42, 42], dims=\"degrees_north\")\n", + "target_lon = xr.DataArray([200, 201, 202, 205], dims=\"degrees_east\")\n", + "\n", + "da.sel(lat=target_lat, lon=target_lon, method=\"nearest\") # -- orthogonal indexing" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -229,35 +241,6 @@ "```" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Analogously, label-based pointwise-indexing is also possible by the `.sel()` method:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "da = xr.DataArray(\n", - " np.random.rand(4, 3),\n", - " [\n", - " (\"time\", pd.date_range(\"2000-01-01\", periods=4)),\n", - " (\"space\", [\"IA\", \"IL\", \"IN\"]),\n", - " ],\n", - ")\n", - "times = xr.DataArray(pd.to_datetime([\"2000-01-03\", \"2000-01-02\", \"2000-01-01\"]), dims=\"new_time\")\n", - "\n", - "\n", - "# -- get data for each state and each time:\n", - "da.sel(space=xr.DataArray([\"IA\", \"IL\", \"IN\"], dims=[\"new_time\"]), time=times)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -269,6 +252,11 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -278,7 +266,8 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" + "pygments_lexer": "ipython3", + "version": "3.11.4" }, "toc": { "base_numbering": 1, From 407542e58af8fd5d3db3aea07fd5f182474bf22b Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 12:21:54 -0600 Subject: [PATCH 03/20] initial commits and contents --- intermediate/intro-to-zarr.ipynb | 63 ++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 intermediate/intro-to-zarr.ipynb diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb new file mode 100644 index 00000000..1bbc9f05 --- /dev/null +++ b/intermediate/intro-to-zarr.ipynb @@ -0,0 +1,63 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8253fe2d", + "metadata": {}, + "source": [ + "# Intro to Zarr\n", + "\n", + "This notebook provides a brief introduction to Zarr and how to\n", + "use it in cloud environments for scalable, chunked, and compressed data storage.\n", + "Zarr is a file format with implementations in different languages. In this tutorial, we will look at an example of how to use the Zarr format by looking at some features of the `zarr-python` library and how Zarr files can be opened with `xarray`.\n", + "\n", + "## What is Zarr?\n", + "\n", + "The Zarr data format is an open, community-maintained format designed for efficient, scalable storage of large N-dimensional arrays. It stores data as compressed and chunked arrays in a format well-suited to parallel processing and cloud-native workflows.\n", + "\n", + "### Zarr Data Organization:\n", + "- **Arrays**: N-dimensional arrays that can be chunked and compressed.\n", + "- **Groups**: A container for organizing multiple arrays and other groups with a hierarchical structure.\n", + "- **Metadata**: JSON-like metadata describing the arrays and groups, including information about dimensions, data types, and compression.\n", + "- **Dimensions and Shape**: Arrays can have any number of dimensions, and their shape is defined by the number of elements in each dimension.\n", + "- **Coordinates & Indexing**: Zarr supports coordinate arrays for each dimension, allowing for efficient indexing and slicing.\n", + "\n", + "The diagram below showing the structure of a Zarr file:\n", + "![EarthData](https://learning.nceas.ucsb.edu/2025-04-arctic/images/zarr-chunks.png)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae9c38ed", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "233640b0", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "The Xarray library provides a rich API for working with Zarr data, slicing, and selecting data. \n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 2c1b2bfdf558ac3b76f4725c918ab9efd7ddf38d Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 12:38:53 -0600 Subject: [PATCH 04/20] data storage --- intermediate/intro-to-zarr.ipynb | 325 +++++++++++++++++++++++++++++-- 1 file changed, 311 insertions(+), 14 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 1bbc9f05..348cd8ca 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -27,35 +27,332 @@ ] }, { - "cell_type": "code", - "execution_count": null, - "id": "ae9c38ed", + "cell_type": "markdown", + "id": "89a8f0ec", "metadata": { "vscode": { "languageId": "plaintext" } }, - "outputs": [], - "source": [] + "source": [ + "### Zarr Fundamenals\n", + "A Zarr array has the following important properties:\n", + "- **Shape**: The dimensions of the array.\n", + "- **Dtype**: The data type of each element (e.g., float32).\n", + "- **Attributes**: Metadata stored as key-value pairs (e.g., units, description.\n", + "- **Compressors**: Algorithms used to compress each chunk (e.g., Blosc, Zlib).\n", + "\n", + "\n", + "#### Example: Creating and Inspecting a Zarr Array" + ] }, { "cell_type": "code", - "execution_count": null, - "id": "233640b0", - "metadata": { - "vscode": { - "languageId": "plaintext" + "execution_count": 1, + "id": "ae9c38ed", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" } - }, - "outputs": [], + ], "source": [ - "The Xarray library provides a rich API for working with Zarr data, slicing, and selecting data. \n" + "import zarr\n", + "z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr')\n", + "z" ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "0f39867a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Typezarr.core.Array
Data typefloat64
Shape(40, 50)
Chunk shape(10, 10)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes16000 (15.6K)
No. bytes stored337
Storage ratio47.5
Chunks initialized0/20
" + ], + "text/plain": [ + "Type : zarr.core.Array\n", + "Data type : float64\n", + "Shape : (40, 50)\n", + "Chunk shape : (10, 10)\n", + "Order : C\n", + "Read-only : False\n", + "Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\n", + "Store type : zarr.storage.DirectoryStore\n", + "No. bytes : 16000 (15.6K)\n", + "No. bytes stored : 337\n", + "Storage ratio : 47.5\n", + "Chunks initialized : 0/20" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z.info" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "dbe47985", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z.fill_value" + ] + }, + { + "cell_type": "markdown", + "id": "f5dcee68", + "metadata": {}, + "source": [ + "No data has been written to the array yet. If we try to access the data, we will get a fill value: " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "7d905f06", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z[0, 0]\n" + ] + }, + { + "cell_type": "markdown", + "id": "a6091ba5", + "metadata": {}, + "source": [ + "This is how we assign data to the array. When we do this it gets written immediately." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "1ccc28b6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Typezarr.core.Array
Data typefloat64
Shape(40, 50)
Chunk shape(10, 10)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes16000 (15.6K)
No. bytes stored1277 (1.2K)
Storage ratio12.5
Chunks initialized20/20
" + ], + "text/plain": [ + "Type : zarr.core.Array\n", + "Data type : float64\n", + "Shape : (40, 50)\n", + "Chunk shape : (10, 10)\n", + "Order : C\n", + "Read-only : False\n", + "Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\n", + "Store type : zarr.storage.DirectoryStore\n", + "No. bytes : 16000 (15.6K)\n", + "No. bytes stored : 1277 (1.2K)\n", + "Storage ratio : 12.5\n", + "Chunks initialized : 20/20" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z[:] = 1\n", + "z.info" + ] + }, + { + "cell_type": "markdown", + "id": "c6a059cc", + "metadata": {}, + "source": [ + "##### Attributes\n", + "\n", + "We can attach arbitrary metadata to our Array via attributes:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "859c9cfe", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'standard_name': 'wind_speed', 'units': 'm/s'}\n" + ] + } + ], + "source": [ + "z.attrs['units'] = 'm/s'\n", + "z.attrs['standard_name'] = 'wind_speed'\n", + "print(dict(z.attrs))" + ] + }, + { + "cell_type": "markdown", + "id": "23885ea0", + "metadata": {}, + "source": [ + "### Zarr Data Storage\n", + "\n", + "Zarr can be stored in memory, on disk, or in cloud storage systems like Amazon S3.\n", + "\n", + "Let's look under the hood. _The ability to look inside a Zarr store and understand what is there is a deliberate design decision._" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "1bbc935c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z.store" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "51953f01", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[01;34mtest.zarr\u001b[0m\n", + "├── \u001b[00m.zarray\u001b[0m\n", + "├── \u001b[00m.zattrs\u001b[0m\n", + "├── \u001b[00m0.0\u001b[0m\n", + "├── \u001b[00m0.1\u001b[0m\n", + "├── \u001b[00m0.2\u001b[0m\n", + "├── \u001b[00m0.3\u001b[0m\n", + "├── \u001b[00m0.4\u001b[0m\n", + "├── \u001b[00m1.0\u001b[0m\n", + "├── \u001b[00m1.1\u001b[0m\n" + ] + } + ], + "source": [ + "!tree -a test.zarr | head" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "9a6365b7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'chunks': [10, 10], 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'dtype': ' Date: Fri, 4 Jul 2025 13:31:01 -0600 Subject: [PATCH 05/20] updates and cloud storage --- intermediate/intro-to-zarr.ipynb | 360 +++++++++++++++++++++++++------ 1 file changed, 292 insertions(+), 68 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 348cd8ca..ce2daed3 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -5,7 +5,7 @@ "id": "8253fe2d", "metadata": {}, "source": [ - "# Intro to Zarr\n", + "# Introduction to Zarr\n", "\n", "This notebook provides a brief introduction to Zarr and how to\n", "use it in cloud environments for scalable, chunked, and compressed data storage.\n", @@ -48,17 +48,17 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 13, "id": "ae9c38ed", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 1, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -71,31 +71,29 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 14, "id": "0f39867a", "metadata": {}, "outputs": [ { "data": { - "text/html": [ - "
Typezarr.core.Array
Data typefloat64
Shape(40, 50)
Chunk shape(10, 10)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes16000 (15.6K)
No. bytes stored337
Storage ratio47.5
Chunks initialized0/20
" - ], "text/plain": [ - "Type : zarr.core.Array\n", - "Data type : float64\n", + "Type : Array\n", + "Zarr format : 3\n", + "Data type : DataType.float64\n", + "Fill value : 0.0\n", "Shape : (40, 50)\n", "Chunk shape : (10, 10)\n", "Order : C\n", "Read-only : False\n", - "Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\n", - "Store type : zarr.storage.DirectoryStore\n", - "No. bytes : 16000 (15.6K)\n", - "No. bytes stored : 337\n", - "Storage ratio : 47.5\n", - "Chunks initialized : 0/20" + "Store type : LocalStore\n", + "Filters : ()\n", + "Serializer : BytesCodec(endian=)\n", + "Compressors : (ZstdCodec(level=0, checksum=False),)\n", + "No. bytes : 16000 (15.6K)" ] }, - "execution_count": 2, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } @@ -106,17 +104,17 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 15, "id": "dbe47985", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "0.0" + "np.float64(0.0)" ] }, - "execution_count": 3, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -135,17 +133,17 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 16, "id": "7d905f06", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "0.0" + "array(0.)" ] }, - "execution_count": 4, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -164,31 +162,29 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 17, "id": "1ccc28b6", "metadata": {}, "outputs": [ { "data": { - "text/html": [ - "
Typezarr.core.Array
Data typefloat64
Shape(40, 50)
Chunk shape(10, 10)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes16000 (15.6K)
No. bytes stored1277 (1.2K)
Storage ratio12.5
Chunks initialized20/20
" - ], "text/plain": [ - "Type : zarr.core.Array\n", - "Data type : float64\n", + "Type : Array\n", + "Zarr format : 3\n", + "Data type : DataType.float64\n", + "Fill value : 0.0\n", "Shape : (40, 50)\n", "Chunk shape : (10, 10)\n", "Order : C\n", "Read-only : False\n", - "Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\n", - "Store type : zarr.storage.DirectoryStore\n", - "No. bytes : 16000 (15.6K)\n", - "No. bytes stored : 1277 (1.2K)\n", - "Storage ratio : 12.5\n", - "Chunks initialized : 20/20" + "Store type : LocalStore\n", + "Filters : ()\n", + "Serializer : BytesCodec(endian=)\n", + "Compressors : (ZstdCodec(level=0, checksum=False),)\n", + "No. bytes : 16000 (15.6K)" ] }, - "execution_count": 6, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -210,7 +206,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 18, "id": "859c9cfe", "metadata": {}, "outputs": [ @@ -218,7 +214,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "{'standard_name': 'wind_speed', 'units': 'm/s'}\n" + "{'units': 'm/s', 'standard_name': 'wind_speed'}\n" ] } ], @@ -242,17 +238,17 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 19, "id": "1bbc935c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "LocalStore('file://test.zarr')" ] }, - "execution_count": 9, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -263,7 +259,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 23, "id": "51953f01", "metadata": {}, "outputs": [ @@ -272,15 +268,15 @@ "output_type": "stream", "text": [ "\u001b[01;34mtest.zarr\u001b[0m\n", - "├── \u001b[00m.zarray\u001b[0m\n", - "├── \u001b[00m.zattrs\u001b[0m\n", - "├── \u001b[00m0.0\u001b[0m\n", - "├── \u001b[00m0.1\u001b[0m\n", - "├── \u001b[00m0.2\u001b[0m\n", - "├── \u001b[00m0.3\u001b[0m\n", - "├── \u001b[00m0.4\u001b[0m\n", - "├── \u001b[00m1.0\u001b[0m\n", - "├── \u001b[00m1.1\u001b[0m\n" + "├── \u001b[01;34mc\u001b[0m\n", + "│   ├── \u001b[01;34m0\u001b[0m\n", + "│   │   ├── \u001b[00m0\u001b[0m\n", + "│   │   ├── \u001b[00m1\u001b[0m\n", + "│   │   ├── \u001b[00m2\u001b[0m\n", + "│   │   ├── \u001b[00m3\u001b[0m\n", + "│   │   └── \u001b[00m4\u001b[0m\n", + "│   ├── \u001b[01;34m1\u001b[0m\n", + "│   │   ├── \u001b[00m0\u001b[0m\n" ] } ], @@ -288,57 +284,285 @@ "!tree -a test.zarr | head" ] }, + { + "cell_type": "markdown", + "id": "1e0d1a8e", + "metadata": {}, + "source": [ + "#### Compressors\n", + "A number of different compressors can be used with Zarr. The built-in options include Blosc, Zstandard, and Gzip. Additional compressors are available through the [NumCodecs](https://numcodecs.readthedocs.io) package, which supports LZ4, Zlib, BZ2, and LZMA. \n", + "\n", + "Let's check the compressor we used when creating the array:" + ] + }, { "cell_type": "code", - "execution_count": 11, - "id": "9a6365b7", + "execution_count": 29, + "id": "5263951c", "metadata": {}, "outputs": [ { - "name": "stdout", + "data": { + "text/plain": [ + "(ZstdCodec(level=0, checksum=False),)" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z.compressors" + ] + }, + { + "cell_type": "markdown", + "id": "b948f73c", + "metadata": {}, + "source": [ + "If you don’t specify a compressor, by default Zarr uses the Zstandard compressor." + ] + }, + { + "cell_type": "markdown", + "id": "75d91cf7", + "metadata": {}, + "source": [ + "How much space was saved by compression?\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "49bbc63e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Type : Array\n", + "Zarr format : 3\n", + "Data type : DataType.float64\n", + "Fill value : 0.0\n", + "Shape : (40, 50)\n", + "Chunk shape : (10, 10)\n", + "Order : C\n", + "Read-only : False\n", + "Store type : LocalStore\n", + "Filters : ()\n", + "Serializer : BytesCodec(endian=)\n", + "Compressors : (ZstdCodec(level=0, checksum=False),)\n", + "No. bytes : 16000 (15.6K)\n", + "No. bytes stored : 1216\n", + "Storage ratio : 13.2\n", + "Chunks Initialized : 20" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z.info_complete()" + ] + }, + { + "cell_type": "markdown", + "id": "4b33663a", + "metadata": {}, + "source": [ + "You can set `compression=None` when creating a Zarr array to turn off compression. This is useful for debugging or when you want to store data without compression." + ] + }, + { + "cell_type": "markdown", + "id": "388d7c50", + "metadata": {}, + "source": [ + "```{info}\n", + "`.info_complete()` provides a more detailed view of the Zarr array, including metadata about the chunks, compressors, and attributes, but will be slower for larger arrays. \n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "cd94a896", + "metadata": {}, + "source": [ + "#### Consolidated Metadata\n", + "Zarr supports consolidated metadata, which allows you to store all metadata in a single file. This can improve performance when reading metadata, especially for large datasets.\n", + "\n", + "So far we have only been dealing in single array Zarr data stores. In this next example, we will create a zarr store with multiple arrays and then consolidate metadata. The speed up is significant when dealing in remote storage options, which we will see in the following example on accessing cloud storage." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "8498eccc", + "metadata": {}, + "outputs": [ + { + "name": "stderr", "output_type": "stream", "text": [ - "{'chunks': [10, 10], 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'dtype': '" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "import json\n", - "with open('test.zarr/.zarray') as fp:\n", - " print(json.load(fp))" + "store = zarr.storage.MemoryStore()\n", + "group = zarr.create_group(store=store)\n", + "group.create_array(shape=(1,), name='a', dtype='float64')\n", + "group.create_array(shape=(2, 2), name='b', dtype='float64')\n", + "group.create_array(shape=(3, 3, 3), name='c', dtype='float64')\n", + "zarr.consolidate_metadata(store)" + ] + }, + { + "cell_type": "markdown", + "id": "7b8d557b", + "metadata": {}, + "source": [ + "Now, if we open that group, the Group’s metadata has a zarr.core.group.ConsolidatedMetadata that can be used:" ] }, { "cell_type": "code", - "execution_count": 12, - "id": "d8f05ea3", + "execution_count": 44, + "id": "57c688fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "{'standard_name': 'wind_speed', 'units': 'm/s'}\n" + "{'a': ArrayV3Metadata(shape=(1,),\n", + " data_type=,\n", + " chunk_grid=RegularChunkGrid(chunk_shape=(1,)),\n", + " chunk_key_encoding=DefaultChunkKeyEncoding(name='default',\n", + " separator='/'),\n", + " fill_value=np.float64(0.0),\n", + " codecs=(BytesCodec(endian=),\n", + " ZstdCodec(level=0, checksum=False)),\n", + " attributes={},\n", + " dimension_names=None,\n", + " zarr_format=3,\n", + " node_type='array',\n", + " storage_transformers=()),\n", + " 'b': ArrayV3Metadata(shape=(2, 2),\n", + " data_type=,\n", + " chunk_grid=RegularChunkGrid(chunk_shape=(2, 2)),\n", + " chunk_key_encoding=DefaultChunkKeyEncoding(name='default',\n", + " separator='/'),\n", + " fill_value=np.float64(0.0),\n", + " codecs=(BytesCodec(endian=),\n", + " ZstdCodec(level=0, checksum=False)),\n", + " attributes={},\n", + " dimension_names=None,\n", + " zarr_format=3,\n", + " node_type='array',\n", + " storage_transformers=()),\n", + " 'c': ArrayV3Metadata(shape=(3, 3, 3),\n", + " data_type=,\n", + " chunk_grid=RegularChunkGrid(chunk_shape=(3, 3, 3)),\n", + " chunk_key_encoding=DefaultChunkKeyEncoding(name='default',\n", + " separator='/'),\n", + " fill_value=np.float64(0.0),\n", + " codecs=(BytesCodec(endian=),\n", + " ZstdCodec(level=0, checksum=False)),\n", + " attributes={},\n", + " dimension_names=None,\n", + " zarr_format=3,\n", + " node_type='array',\n", + " storage_transformers=())}\n" ] } ], "source": [ - "with open('test.zarr/.zattrs') as fp:\n", - " print(json.load(fp))" + "consolidated = zarr.open_group(store=store)\n", + "consolidated_metadata = consolidated.metadata.consolidated_metadata.metadata\n", + "from pprint import pprint\n", + "pprint(dict(sorted(consolidated_metadata.items())))" + ] + }, + { + "cell_type": "markdown", + "id": "c6454acf", + "metadata": {}, + "source": [ + "### Object Storage as a Zarr Store\n", + "\n", + "Zarr’s layout (many files/chunks per array) maps perfectly onto object storage, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. Each chunk is stored as a separate object, enabling distributed reads/writes.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "5eb5ff8b", + "metadata": {}, + "source": [ + "Here are some examples of Zarr stores on the cloud:\n", + "\n", + "* [Zarr data in Microsoft's Planetary Computer](https://planetarycomputer.microsoft.com/catalog?filter=zarr)\n", + "* [Zarr data from Google](https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&_ga=2.226354714.1000882083.1692116148-1788942020.1692116148&pli=1&q=zarr)\n", + "* [Amazon Sustainability Data Initiative available from Registry of Open Data on AWS](https://registry.opendata.aws/collab/asdi/) - Enter \"Zarr\" in the Search input box.\n", + "* [Pangeo-Forge Data Catalog](https://pangeo-forge.org/catalog)\n" ] }, + { + "cell_type": "markdown", + "id": "18c6f915", + "metadata": {}, + "source": [] + }, { "cell_type": "code", - "execution_count": null, - "id": "1e0d1a8e", + "execution_count": 49, + "id": "dba10f47", + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'xr' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mNameError\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[49]\u001b[39m\u001b[32m, line 3\u001b[39m\n\u001b[32m 1\u001b[39m store = \u001b[33m'\u001b[39m\u001b[33mhttps://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr\u001b[39m\u001b[33m'\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m3\u001b[39m ds = \u001b[43mxr\u001b[49m.open_dataset(store, engine=\u001b[33m'\u001b[39m\u001b[33mzarr\u001b[39m\u001b[33m'\u001b[39m, chunks={}, consolidated=\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[32m 4\u001b[39m ds\n", + "\u001b[31mNameError\u001b[39m: name 'xr' is not defined" + ] + } + ], + "source": [ + "store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr'\n", + "\n", + "ds = xr.open_dataset(store, engine='zarr', chunks={}, consolidated=True)\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "id": "9c4af068", "metadata": {}, - "outputs": [], "source": [] } ], "metadata": { "kernelspec": { - "display_name": "ERA5_interactive", + "display_name": "zarr_tutorial", "language": "python", "name": "python3" }, @@ -352,7 +576,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.9" + "version": "3.13.5" } }, "nbformat": 4, From 95644a0ef89e31cf0463e1e1ba3384563b067e70 Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:10:03 -0600 Subject: [PATCH 06/20] updates and cloud storage --- intermediate/intro-to-zarr.ipynb | 1717 +++++++++++++++++++++++++++++- 1 file changed, 1665 insertions(+), 52 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index ce2daed3..e3d044d3 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -48,19 +48,27 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 1, "id": "ae9c38ed", "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" + "ename": "ContainsArrayError", + "evalue": "An array exists in store LocalStore('file://test.zarr') at path ''.", + "output_type": "error", + "traceback": [ + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mContainsArrayError\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mzarr\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m z = \u001b[43mzarr\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcreate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m=\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m40\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m50\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mchunks\u001b[49m\u001b[43m=\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m10\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m10\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mf8\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstore\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mtest.zarr\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m 3\u001b[39m z\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/api/synchronous.py:717\u001b[39m, in \u001b[36mcreate\u001b[39m\u001b[34m(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, dimension_separator, write_empty_chunks, zarr_version, zarr_format, meta_array, attributes, chunk_shape, chunk_key_encoding, codecs, dimension_names, storage_options, config, **kwargs)\u001b[39m\n\u001b[32m 602\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mcreate\u001b[39m(\n\u001b[32m 603\u001b[39m shape: ChunkCoords | \u001b[38;5;28mint\u001b[39m,\n\u001b[32m 604\u001b[39m *, \u001b[38;5;66;03m# Note: this is a change from v2\u001b[39;00m\n\u001b[32m (...)\u001b[39m\u001b[32m 638\u001b[39m **kwargs: Any,\n\u001b[32m 639\u001b[39m ) -> Array:\n\u001b[32m 640\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"Create an array.\u001b[39;00m\n\u001b[32m 641\u001b[39m \n\u001b[32m 642\u001b[39m \u001b[33;03m Parameters\u001b[39;00m\n\u001b[32m (...)\u001b[39m\u001b[32m 714\u001b[39m \u001b[33;03m The array.\u001b[39;00m\n\u001b[32m 715\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m 716\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m Array(\n\u001b[32m--> \u001b[39m\u001b[32m717\u001b[39m \u001b[43msync\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 718\u001b[39m \u001b[43m \u001b[49m\u001b[43masync_api\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcreate\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 719\u001b[39m \u001b[43m \u001b[49m\u001b[43mshape\u001b[49m\u001b[43m=\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 720\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunks\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunks\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 721\u001b[39m \u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 722\u001b[39m \u001b[43m \u001b[49m\u001b[43mcompressor\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcompressor\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 723\u001b[39m \u001b[43m \u001b[49m\u001b[43mfill_value\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfill_value\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 724\u001b[39m \u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[43morder\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 725\u001b[39m \u001b[43m \u001b[49m\u001b[43mstore\u001b[49m\u001b[43m=\u001b[49m\u001b[43mstore\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 726\u001b[39m \u001b[43m \u001b[49m\u001b[43msynchronizer\u001b[49m\u001b[43m=\u001b[49m\u001b[43msynchronizer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 727\u001b[39m \u001b[43m \u001b[49m\u001b[43moverwrite\u001b[49m\u001b[43m=\u001b[49m\u001b[43moverwrite\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 728\u001b[39m \u001b[43m \u001b[49m\u001b[43mpath\u001b[49m\u001b[43m=\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 729\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunk_store\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunk_store\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 730\u001b[39m \u001b[43m \u001b[49m\u001b[43mfilters\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfilters\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 731\u001b[39m \u001b[43m \u001b[49m\u001b[43mcache_metadata\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcache_metadata\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 732\u001b[39m \u001b[43m \u001b[49m\u001b[43mcache_attrs\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcache_attrs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 733\u001b[39m \u001b[43m \u001b[49m\u001b[43mread_only\u001b[49m\u001b[43m=\u001b[49m\u001b[43mread_only\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 734\u001b[39m \u001b[43m \u001b[49m\u001b[43mobject_codec\u001b[49m\u001b[43m=\u001b[49m\u001b[43mobject_codec\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 735\u001b[39m \u001b[43m \u001b[49m\u001b[43mdimension_separator\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdimension_separator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 736\u001b[39m \u001b[43m \u001b[49m\u001b[43mwrite_empty_chunks\u001b[49m\u001b[43m=\u001b[49m\u001b[43mwrite_empty_chunks\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 737\u001b[39m \u001b[43m \u001b[49m\u001b[43mzarr_version\u001b[49m\u001b[43m=\u001b[49m\u001b[43mzarr_version\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 738\u001b[39m \u001b[43m \u001b[49m\u001b[43mzarr_format\u001b[49m\u001b[43m=\u001b[49m\u001b[43mzarr_format\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 739\u001b[39m \u001b[43m \u001b[49m\u001b[43mmeta_array\u001b[49m\u001b[43m=\u001b[49m\u001b[43mmeta_array\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 740\u001b[39m \u001b[43m \u001b[49m\u001b[43mattributes\u001b[49m\u001b[43m=\u001b[49m\u001b[43mattributes\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 741\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunk_shape\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunk_shape\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 742\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunk_key_encoding\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunk_key_encoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 743\u001b[39m \u001b[43m \u001b[49m\u001b[43mcodecs\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcodecs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 744\u001b[39m \u001b[43m \u001b[49m\u001b[43mdimension_names\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdimension_names\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 745\u001b[39m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m=\u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 746\u001b[39m \u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 747\u001b[39m \u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 748\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 749\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 750\u001b[39m )\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:163\u001b[39m, in \u001b[36msync\u001b[39m\u001b[34m(coro, loop, timeout)\u001b[39m\n\u001b[32m 160\u001b[39m return_result = \u001b[38;5;28mnext\u001b[39m(\u001b[38;5;28miter\u001b[39m(finished)).result()\n\u001b[32m 162\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(return_result, \u001b[38;5;167;01mBaseException\u001b[39;00m):\n\u001b[32m--> \u001b[39m\u001b[32m163\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m return_result\n\u001b[32m 164\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 165\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m return_result\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:119\u001b[39m, in \u001b[36m_runner\u001b[39m\u001b[34m(coro)\u001b[39m\n\u001b[32m 114\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 115\u001b[39m \u001b[33;03mAwait a coroutine and return the result of running it. If awaiting the coroutine raises an\u001b[39;00m\n\u001b[32m 116\u001b[39m \u001b[33;03mexception, the exception will be returned.\u001b[39;00m\n\u001b[32m 117\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 118\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m119\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m coro\n\u001b[32m 120\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m ex:\n\u001b[32m 121\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m ex\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/api/asynchronous.py:1065\u001b[39m, in \u001b[36mcreate\u001b[39m\u001b[34m(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, dimension_separator, write_empty_chunks, zarr_version, zarr_format, meta_array, attributes, chunk_shape, chunk_key_encoding, codecs, dimension_names, storage_options, config, **kwargs)\u001b[39m\n\u001b[32m 1061\u001b[39m warnings.warn(\u001b[38;5;167;01mUserWarning\u001b[39;00m(msg), stacklevel=\u001b[32m1\u001b[39m)\n\u001b[32m 1063\u001b[39m config_parsed = ArrayConfig.from_dict(config_dict)\n\u001b[32m-> \u001b[39m\u001b[32m1065\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m AsyncArray._create(\n\u001b[32m 1066\u001b[39m store_path,\n\u001b[32m 1067\u001b[39m shape=shape,\n\u001b[32m 1068\u001b[39m chunks=chunks,\n\u001b[32m 1069\u001b[39m dtype=dtype,\n\u001b[32m 1070\u001b[39m compressor=compressor,\n\u001b[32m 1071\u001b[39m fill_value=fill_value,\n\u001b[32m 1072\u001b[39m overwrite=overwrite,\n\u001b[32m 1073\u001b[39m filters=filters,\n\u001b[32m 1074\u001b[39m dimension_separator=dimension_separator,\n\u001b[32m 1075\u001b[39m order=order,\n\u001b[32m 1076\u001b[39m zarr_format=zarr_format,\n\u001b[32m 1077\u001b[39m chunk_shape=chunk_shape,\n\u001b[32m 1078\u001b[39m chunk_key_encoding=chunk_key_encoding,\n\u001b[32m 1079\u001b[39m codecs=codecs,\n\u001b[32m 1080\u001b[39m dimension_names=dimension_names,\n\u001b[32m 1081\u001b[39m attributes=attributes,\n\u001b[32m 1082\u001b[39m config=config_parsed,\n\u001b[32m 1083\u001b[39m **kwargs,\n\u001b[32m 1084\u001b[39m )\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/array.py:608\u001b[39m, in \u001b[36mAsyncArray._create\u001b[39m\u001b[34m(cls, store, shape, dtype, zarr_format, fill_value, attributes, chunk_shape, chunk_key_encoding, codecs, dimension_names, chunks, dimension_separator, order, filters, compressor, overwrite, data, config)\u001b[39m\n\u001b[32m 605\u001b[39m _warn_order_kwarg()\n\u001b[32m 606\u001b[39m config_parsed = replace(config_parsed, order=order)\n\u001b[32m--> \u001b[39m\u001b[32m608\u001b[39m result = \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mcls\u001b[39m._create_v3(\n\u001b[32m 609\u001b[39m store_path,\n\u001b[32m 610\u001b[39m shape=shape,\n\u001b[32m 611\u001b[39m dtype=dtype_parsed,\n\u001b[32m 612\u001b[39m chunk_shape=_chunks,\n\u001b[32m 613\u001b[39m fill_value=fill_value,\n\u001b[32m 614\u001b[39m chunk_key_encoding=chunk_key_encoding,\n\u001b[32m 615\u001b[39m codecs=codecs,\n\u001b[32m 616\u001b[39m dimension_names=dimension_names,\n\u001b[32m 617\u001b[39m attributes=attributes,\n\u001b[32m 618\u001b[39m overwrite=overwrite,\n\u001b[32m 619\u001b[39m config=config_parsed,\n\u001b[32m 620\u001b[39m )\n\u001b[32m 621\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m zarr_format == \u001b[32m2\u001b[39m:\n\u001b[32m 622\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m codecs \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/array.py:730\u001b[39m, in \u001b[36mAsyncArray._create_v3\u001b[39m\u001b[34m(cls, store_path, shape, dtype, chunk_shape, config, fill_value, chunk_key_encoding, codecs, dimension_names, attributes, overwrite)\u001b[39m\n\u001b[32m 728\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m ensure_no_existing_node(store_path, zarr_format=\u001b[32m3\u001b[39m)\n\u001b[32m 729\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m730\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m ensure_no_existing_node(store_path, zarr_format=\u001b[32m3\u001b[39m)\n\u001b[32m 732\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(chunk_key_encoding, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[32m 733\u001b[39m chunk_key_encoding = (\n\u001b[32m 734\u001b[39m V2ChunkKeyEncoding(separator=chunk_key_encoding[\u001b[32m1\u001b[39m])\n\u001b[32m 735\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m chunk_key_encoding[\u001b[32m0\u001b[39m] == \u001b[33m\"\u001b[39m\u001b[33mv2\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 736\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m DefaultChunkKeyEncoding(separator=chunk_key_encoding[\u001b[32m1\u001b[39m])\n\u001b[32m 737\u001b[39m )\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/storage/_common.py:414\u001b[39m, in \u001b[36mensure_no_existing_node\u001b[39m\u001b[34m(store_path, zarr_format)\u001b[39m\n\u001b[32m 411\u001b[39m extant_node = \u001b[38;5;28;01mawait\u001b[39;00m _contains_node_v3(store_path)\n\u001b[32m 413\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m extant_node == \u001b[33m\"\u001b[39m\u001b[33marray\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m--> \u001b[39m\u001b[32m414\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m ContainsArrayError(store_path.store, store_path.path)\n\u001b[32m 415\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m extant_node == \u001b[33m\"\u001b[39m\u001b[33mgroup\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m 416\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m ContainsGroupError(store_path.store, store_path.path)\n", + "\u001b[31mContainsArrayError\u001b[39m: An array exists in store LocalStore('file://test.zarr') at path ''." + ] } ], "source": [ @@ -71,7 +79,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "id": "0f39867a", "metadata": {}, "outputs": [ @@ -93,7 +101,7 @@ "No. bytes : 16000 (15.6K)" ] }, - "execution_count": 14, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } @@ -104,7 +112,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "id": "dbe47985", "metadata": {}, "outputs": [ @@ -114,7 +122,7 @@ "np.float64(0.0)" ] }, - "execution_count": 15, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -133,7 +141,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "id": "7d905f06", "metadata": {}, "outputs": [ @@ -143,7 +151,7 @@ "array(0.)" ] }, - "execution_count": 16, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -162,36 +170,32 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "id": "1ccc28b6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Type : Array\n", - "Zarr format : 3\n", - "Data type : DataType.float64\n", - "Fill value : 0.0\n", - "Shape : (40, 50)\n", - "Chunk shape : (10, 10)\n", - "Order : C\n", - "Read-only : False\n", - "Store type : LocalStore\n", - "Filters : ()\n", - "Serializer : BytesCodec(endian=)\n", - "Compressors : (ZstdCodec(level=0, checksum=False),)\n", - "No. bytes : 16000 (15.6K)" + "array([[ 0., 1., 2., ..., 47., 48., 49.],\n", + " [ 1., 1., 1., ..., 1., 1., 1.],\n", + " [ 1., 1., 1., ..., 1., 1., 1.],\n", + " ...,\n", + " [ 1., 1., 1., ..., 1., 1., 1.],\n", + " [ 1., 1., 1., ..., 1., 1., 1.],\n", + " [ 1., 1., 1., ..., 1., 1., 1.]], shape=(40, 50))" ] }, - "execution_count": 17, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ + "import numpy as np\n", "z[:] = 1\n", - "z.info" + "z[0, :] = np.arange(50)\n", + "z[:]" ] }, { @@ -206,7 +210,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "id": "859c9cfe", "metadata": {}, "outputs": [ @@ -238,7 +242,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "id": "1bbc935c", "metadata": {}, "outputs": [ @@ -248,7 +252,7 @@ "LocalStore('file://test.zarr')" ] }, - "execution_count": 19, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -259,7 +263,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "id": "51953f01", "metadata": {}, "outputs": [ @@ -284,6 +288,130 @@ "!tree -a test.zarr | head" ] }, + { + "cell_type": "markdown", + "id": "35f5384e", + "metadata": {}, + "source": [ + "To create groups in your store, use the `create_group` method after creating a root group. Here, we’ll create two groups, `temp` and `precip`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3c377a0", + "metadata": {}, + "outputs": [], + "source": [ + "root = zarr.group()\n", + "temp = root.create_group('temp')\n", + "precip = root.create_group('precip')\n", + "t2m = temp.create_array('t2m', shape=(100,100), chunks=(10,10), dtype='i4')\n", + "prcp = precip.create_array('prcp', shape=(1000,1000), chunks=(10,10), dtype='i4')" + ] + }, + { + "cell_type": "markdown", + "id": "45fb5bc2", + "metadata": {}, + "source": [ + "Groups can easily be accessed by name and index.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eecad1a6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "root['temp']\n", + "root['temp/t2m'][:, 3]" + ] + }, + { + "cell_type": "markdown", + "id": "f52c70ba", + "metadata": {}, + "source": [ + "To get a look at your overall dataset, the `tree` and `info` methods are helpful.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "164febda", + "metadata": {}, + "outputs": [ + { + "ename": "ImportError", + "evalue": "'rich' is required for Group.tree", + "output_type": "error", + "traceback": [ + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/_tree.py:9\u001b[39m\n\u001b[32m 8\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m----> \u001b[39m\u001b[32m9\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mrich\u001b[39;00m\n\u001b[32m 10\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mrich\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mconsole\u001b[39;00m\n", + "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'rich'", + "\nThe above exception was the direct cause of the following exception:\n", + "\u001b[31mImportError\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[11]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mroot\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/group.py:2361\u001b[39m, in \u001b[36mGroup.tree\u001b[39m\u001b[34m(self, expand, level)\u001b[39m\n\u001b[32m 2342\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mtree\u001b[39m(\u001b[38;5;28mself\u001b[39m, expand: \u001b[38;5;28mbool\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m, level: \u001b[38;5;28mint\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m) -> Any:\n\u001b[32m 2343\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 2344\u001b[39m \u001b[33;03m Return a tree-like representation of a hierarchy.\u001b[39;00m\n\u001b[32m 2345\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 2359\u001b[39m \u001b[33;03m A pretty-printable object displaying the hierarchy.\u001b[39;00m\n\u001b[32m 2360\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m2361\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_sync\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_async_group\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m=\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m=\u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:208\u001b[39m, in \u001b[36mSyncMixin._sync\u001b[39m\u001b[34m(self, coroutine)\u001b[39m\n\u001b[32m 205\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34m_sync\u001b[39m(\u001b[38;5;28mself\u001b[39m, coroutine: Coroutine[Any, Any, T]) -> T:\n\u001b[32m 206\u001b[39m \u001b[38;5;66;03m# TODO: refactor this to to take *args and **kwargs and pass those to the method\u001b[39;00m\n\u001b[32m 207\u001b[39m \u001b[38;5;66;03m# this should allow us to better type the sync wrapper\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m208\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43msync\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 209\u001b[39m \u001b[43m \u001b[49m\u001b[43mcoroutine\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 210\u001b[39m \u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43masync.timeout\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 211\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:163\u001b[39m, in \u001b[36msync\u001b[39m\u001b[34m(coro, loop, timeout)\u001b[39m\n\u001b[32m 160\u001b[39m return_result = \u001b[38;5;28mnext\u001b[39m(\u001b[38;5;28miter\u001b[39m(finished)).result()\n\u001b[32m 162\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(return_result, \u001b[38;5;167;01mBaseException\u001b[39;00m):\n\u001b[32m--> \u001b[39m\u001b[32m163\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m return_result\n\u001b[32m 164\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 165\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m return_result\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:119\u001b[39m, in \u001b[36m_runner\u001b[39m\u001b[34m(coro)\u001b[39m\n\u001b[32m 114\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 115\u001b[39m \u001b[33;03mAwait a coroutine and return the result of running it. If awaiting the coroutine raises an\u001b[39;00m\n\u001b[32m 116\u001b[39m \u001b[33;03mexception, the exception will be returned.\u001b[39;00m\n\u001b[32m 117\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 118\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m119\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m coro\n\u001b[32m 120\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m ex:\n\u001b[32m 121\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m ex\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/group.py:1583\u001b[39m, in \u001b[36mAsyncGroup.tree\u001b[39m\u001b[34m(self, expand, level)\u001b[39m\n\u001b[32m 1564\u001b[39m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mtree\u001b[39m(\u001b[38;5;28mself\u001b[39m, expand: \u001b[38;5;28mbool\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m, level: \u001b[38;5;28mint\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m) -> Any:\n\u001b[32m 1565\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 1566\u001b[39m \u001b[33;03m Return a tree-like representation of a hierarchy.\u001b[39;00m\n\u001b[32m 1567\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 1581\u001b[39m \u001b[33;03m A pretty-printable object displaying the hierarchy.\u001b[39;00m\n\u001b[32m 1582\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1583\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mzarr\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mcore\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01m_tree\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m group_tree_async\n\u001b[32m 1585\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m expand \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m 1586\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33m'\u001b[39m\u001b[33mexpand\u001b[39m\u001b[33m'\u001b[39m\u001b[33m is not yet implemented.\u001b[39m\u001b[33m\"\u001b[39m)\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/_tree.py:13\u001b[39m\n\u001b[32m 11\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mrich\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mtree\u001b[39;00m\n\u001b[32m 12\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[32m---> \u001b[39m\u001b[32m13\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33m'\u001b[39m\u001b[33mrich\u001b[39m\u001b[33m'\u001b[39m\u001b[33m is required for Group.tree\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01me\u001b[39;00m\n\u001b[32m 16\u001b[39m \u001b[38;5;28;01mclass\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mTreeRepr\u001b[39;00m:\n\u001b[32m 17\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 18\u001b[39m \u001b[33;03m A simple object with a tree-like repr for the Zarr Group.\u001b[39;00m\n\u001b[32m 19\u001b[39m \n\u001b[32m 20\u001b[39m \u001b[33;03m Note that this object and it's implementation isn't considered part\u001b[39;00m\n\u001b[32m 21\u001b[39m \u001b[33;03m of Zarr's public API.\u001b[39;00m\n\u001b[32m 22\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n", + "\u001b[31mImportError\u001b[39m: 'rich' is required for Group.tree" + ] + } + ], + "source": [ + "root.tree()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2363137a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Name : \n", + "Type : Group\n", + "Zarr format : 3\n", + "Read-only : False\n", + "Store type : MemoryStore" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "root.info" + ] + }, { "cell_type": "markdown", "id": "1e0d1a8e", @@ -297,7 +425,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "id": "5263951c", "metadata": {}, "outputs": [ @@ -334,7 +462,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "id": "49bbc63e", "metadata": {}, "outputs": [ @@ -399,7 +527,7 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": null, "id": "8498eccc", "metadata": {}, "outputs": [ @@ -441,7 +569,7 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": null, "id": "57c688fc", "metadata": {}, "outputs": [ @@ -503,7 +631,7 @@ "id": "c6454acf", "metadata": {}, "source": [ - "### Object Storage as a Zarr Store\n", + "## Object Storage as a Zarr Store\n", "\n", "Zarr’s layout (many files/chunks per array) maps perfectly onto object storage, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. Each chunk is stored as a separate object, enabling distributed reads/writes.\n", "\n" @@ -526,24 +654,852 @@ "cell_type": "markdown", "id": "18c6f915", "metadata": {}, - "source": [] + "source": [ + "### Xarray and Zarr\n", + "\n", + "Xarray has built-in support for reading and writing Zarr data. You can use the `xarray.open_zarr()` function to open a Zarr store as an Xarray dataset.\n", + "\n" + ] }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 15, "id": "dba10f47", "metadata": {}, "outputs": [ { - "ename": "NameError", - "evalue": "name 'xr' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[31m---------------------------------------------------------------------------\u001b[39m", - "\u001b[31mNameError\u001b[39m Traceback (most recent call last)", - "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[49]\u001b[39m\u001b[32m, line 3\u001b[39m\n\u001b[32m 1\u001b[39m store = \u001b[33m'\u001b[39m\u001b[33mhttps://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr\u001b[39m\u001b[33m'\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m3\u001b[39m ds = \u001b[43mxr\u001b[49m.open_dataset(store, engine=\u001b[33m'\u001b[39m\u001b[33mzarr\u001b[39m\u001b[33m'\u001b[39m, chunks={}, consolidated=\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[32m 4\u001b[39m ds\n", - "\u001b[31mNameError\u001b[39m: name 'xr' is not defined" - ] + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.Dataset> Size: 2GB\n",
+       "Dimensions:      (latitude: 180, nv: 2, longitude: 360, time: 9226)\n",
+       "Coordinates:\n",
+       "    lat_bounds   (latitude, nv) float32 1kB dask.array<chunksize=(180, 2), meta=np.ndarray>\n",
+       "  * latitude     (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n",
+       "    lon_bounds   (longitude, nv) float32 3kB dask.array<chunksize=(360, 2), meta=np.ndarray>\n",
+       "  * longitude    (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
+       "  * time         (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n",
+       "    time_bounds  (time, nv) datetime64[ns] 148kB dask.array<chunksize=(200, 2), meta=np.ndarray>\n",
+       "Dimensions without coordinates: nv\n",
+       "Data variables:\n",
+       "    precip       (time, latitude, longitude) float32 2GB dask.array<chunksize=(200, 180, 360), meta=np.ndarray>\n",
+       "Attributes: (12/45)\n",
+       "    Conventions:                CF-1.6, ACDD 1.3\n",
+       "    Metadata_Conventions:       CF-1.6, Unidata Dataset Discovery v1.0, NOAA ...\n",
+       "    acknowledgment:             This project was supported in part by a grant...\n",
+       "    cdm_data_type:              Grid\n",
+       "    cdr_program:                NOAA Climate Data Record Program for satellit...\n",
+       "    cdr_variable:               precipitation\n",
+       "    ...                         ...\n",
+       "    standard_name_vocabulary:   CF Standard Name Table (v41, 22 February 2017)\n",
+       "    summary:                    Global Precipitation Climatology Project (GPC...\n",
+       "    time_coverage_duration:     P1D\n",
+       "    time_coverage_end:          1996-10-01T23:59:59Z\n",
+       "    time_coverage_start:        1996-10-01T00:00:00Z\n",
+       "    title:                      Global Precipitation Climatatology Project (G...
" + ], + "text/plain": [ + " Size: 2GB\n", + "Dimensions: (latitude: 180, nv: 2, longitude: 360, time: 9226)\n", + "Coordinates:\n", + " lat_bounds (latitude, nv) float32 1kB dask.array\n", + " * latitude (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n", + " lon_bounds (longitude, nv) float32 3kB dask.array\n", + " * longitude (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n", + " * time (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n", + " time_bounds (time, nv) datetime64[ns] 148kB dask.array\n", + "Dimensions without coordinates: nv\n", + "Data variables:\n", + " precip (time, latitude, longitude) float32 2GB dask.array\n", + "Attributes: (12/45)\n", + " Conventions: CF-1.6, ACDD 1.3\n", + " Metadata_Conventions: CF-1.6, Unidata Dataset Discovery v1.0, NOAA ...\n", + " acknowledgment: This project was supported in part by a grant...\n", + " cdm_data_type: Grid\n", + " cdr_program: NOAA Climate Data Record Program for satellit...\n", + " cdr_variable: precipitation\n", + " ... ...\n", + " standard_name_vocabulary: CF Standard Name Table (v41, 22 February 2017)\n", + " summary: Global Precipitation Climatology Project (GPC...\n", + " time_coverage_duration: P1D\n", + " time_coverage_end: 1996-10-01T23:59:59Z\n", + " time_coverage_start: 1996-10-01T00:00:00Z\n", + " title: Global Precipitation Climatatology Project (G..." + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ @@ -553,10 +1509,667 @@ "ds" ] }, + { + "cell_type": "code", + "execution_count": 13, + "id": "9d48c39e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'precip' (time: 9226, latitude: 180, longitude: 360)> Size: 2GB\n",
+       "dask.array<open_dataset-precip, shape=(9226, 180, 360), dtype=float32, chunksize=(200, 180, 360), chunktype=numpy.ndarray>\n",
+       "Coordinates:\n",
+       "  * latitude   (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n",
+       "  * longitude  (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
+       "  * time       (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n",
+       "Attributes:\n",
+       "    cell_methods:   area: mean time: mean\n",
+       "    long_name:      NOAA Climate Data Record (CDR) of Daily GPCP Satellite-Ga...\n",
+       "    standard_name:  lwe_precipitation_rate\n",
+       "    units:          mm/day\n",
+       "    valid_range:    [0.0, 100.0]
" + ], + "text/plain": [ + " Size: 2GB\n", + "dask.array\n", + "Coordinates:\n", + " * latitude (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n", + " * longitude (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n", + " * time (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n", + "Attributes:\n", + " cell_methods: area: mean time: mean\n", + " long_name: NOAA Climate Data Record (CDR) of Daily GPCP Satellite-Ga...\n", + " standard_name: lwe_precipitation_rate\n", + " units: mm/day\n", + " valid_range: [0.0, 100.0]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds.precip" + ] + }, { "cell_type": "markdown", - "id": "9c4af068", + "id": "76756fb5", + "metadata": {}, + "source": [ + "::::{admonition} Exercise\n", + ":class: tip\n", + "\n", + "Can you calculate the mean precipitation over the time dimension in the GPCP dataset and plot it?\n", + "\n", + ":::{admonition} Solution\n", + ":class: dropdown\n", + "\n", + "```python\n", + "ds.precip.mean(dim='time').plot()\n", + "\n", + "```\n", + ":::\n", + "::::" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23a1ef1a", "metadata": {}, + "outputs": [], "source": [] } ], From be3c3116c262e6ae5a4becc9a017d699fa7db2fb Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:17:24 -0600 Subject: [PATCH 07/20] additional resources --- intermediate/intro-to-zarr.ipynb | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index e3d044d3..3fb2cc04 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -2165,11 +2165,23 @@ ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "id": "23a1ef1a", "metadata": {}, - "outputs": [], + "source": [ + "In the next exercise, you will use the Xarray + Zarr to open CMIP6 dataset.\n", + "\n", + "## Additional Resources\n", + "\n", + "- [Zarr Documentation](https://zarr.readthedocs.io/en/stable/)\n", + "- [Cloud Optimized Geospatial Formats](https://guide.cloudnativegeo.org/zarr/zarr-in-practice.html)\n", + "- [Scalable and Computationally Reproducible Approaches to Arctic Research](https://learning.nceas.ucsb.edu/2025-04-arctic/sections/zarr.html)\n" + ] + }, + { + "cell_type": "markdown", + "id": "5ed9088a", + "metadata": {}, "source": [] } ], From 95868b8d9b8c98081db4b2817ac696dfeb9461ab Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:32:12 -0600 Subject: [PATCH 08/20] updates --- intermediate/intro-to-zarr.ipynb | 78 +++++++++++++++++++------------- 1 file changed, 46 insertions(+), 32 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 3fb2cc04..3a619044 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -7,23 +7,35 @@ "source": [ "# Introduction to Zarr\n", "\n", + "## Learning Objectives:\n", + "\n", + "- Understand the principles of the Zarr file format\n", + "- Learn how to read and write Zarr files using the `zarr-python` library\n", + "- Explore how to use Zarr files with `xarray` for data analysis and visualization\n", + "\n", "This notebook provides a brief introduction to Zarr and how to\n", "use it in cloud environments for scalable, chunked, and compressed data storage.\n", + "\n", "Zarr is a file format with implementations in different languages. In this tutorial, we will look at an example of how to use the Zarr format by looking at some features of the `zarr-python` library and how Zarr files can be opened with `xarray`.\n", "\n", "## What is Zarr?\n", "\n", - "The Zarr data format is an open, community-maintained format designed for efficient, scalable storage of large N-dimensional arrays. It stores data as compressed and chunked arrays in a format well-suited to parallel processing and cloud-native workflows.\n", + "The Zarr data format is an open, community-maintained format designed for efficient, scalable storage of large N-dimensional arrays. It stores data as compressed and chunked arrays in a format well-suited to parallel processing and cloud-native workflows. \n", "\n", "### Zarr Data Organization:\n", "- **Arrays**: N-dimensional arrays that can be chunked and compressed.\n", "- **Groups**: A container for organizing multiple arrays and other groups with a hierarchical structure.\n", - "- **Metadata**: JSON-like metadata describing the arrays and groups, including information about dimensions, data types, and compression.\n", + "- **Metadata**: JSON-like metadata describing the arrays and groups, including information about dimensions, data types, groups, and compression.\n", "- **Dimensions and Shape**: Arrays can have any number of dimensions, and their shape is defined by the number of elements in each dimension.\n", "- **Coordinates & Indexing**: Zarr supports coordinate arrays for each dimension, allowing for efficient indexing and slicing.\n", "\n", - "The diagram below showing the structure of a Zarr file:\n", - "![EarthData](https://learning.nceas.ucsb.edu/2025-04-arctic/images/zarr-chunks.png)\n" + "The diagram below from [the NASA Earthdata wiki](https://wiki.earthdata.nasa.gov/display/ESO/Zarr+Format) showing the structure of a Zarr store:\n", + "\n", + "![EarthData](https://learning.nceas.ucsb.edu/2025-04-arctic/images/zarr-chunks.png)\n", + "\n", + "\n", + "NetCDF and Zarr share similar terminology and functionality, but the key difference is that NetCDF is a single file, while Zarr is a directory-based “store” composed of many chunked files—making it better suited for distributed and cloud-based workflows.\n", + "\n" ] }, { @@ -40,46 +52,48 @@ "- **Shape**: The dimensions of the array.\n", "- **Dtype**: The data type of each element (e.g., float32).\n", "- **Attributes**: Metadata stored as key-value pairs (e.g., units, description.\n", - "- **Compressors**: Algorithms used to compress each chunk (e.g., Blosc, Zlib).\n", + "- **Compressors**: Algorithms used to compress each chunk (e.g., Zstd, Blosc, Zlib).\n", + "\n", "\n", + "#### Example: Creating and Inspecting a Zarr Array\n", "\n", - "#### Example: Creating and Inspecting a Zarr Array" + "Here we create a simple 2D array of shape `(40, 50)` with chunks of size `(10, 10)` ,write to the `LocalStore` in the `test.zarr` directory. \n" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 16, "id": "ae9c38ed", "metadata": {}, "outputs": [ { - "ename": "ContainsArrayError", - "evalue": "An array exists in store LocalStore('file://test.zarr') at path ''.", - "output_type": "error", - "traceback": [ - "\u001b[31m---------------------------------------------------------------------------\u001b[39m", - "\u001b[31mContainsArrayError\u001b[39m Traceback (most recent call last)", - "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mzarr\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m z = \u001b[43mzarr\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcreate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m=\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m40\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m50\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mchunks\u001b[49m\u001b[43m=\u001b[49m\u001b[43m(\u001b[49m\u001b[32;43m10\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m10\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mf8\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstore\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m'\u001b[39;49m\u001b[33;43mtest.zarr\u001b[39;49m\u001b[33;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m 3\u001b[39m z\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/api/synchronous.py:717\u001b[39m, in \u001b[36mcreate\u001b[39m\u001b[34m(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, dimension_separator, write_empty_chunks, zarr_version, zarr_format, meta_array, attributes, chunk_shape, chunk_key_encoding, codecs, dimension_names, storage_options, config, **kwargs)\u001b[39m\n\u001b[32m 602\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mcreate\u001b[39m(\n\u001b[32m 603\u001b[39m shape: ChunkCoords | \u001b[38;5;28mint\u001b[39m,\n\u001b[32m 604\u001b[39m *, \u001b[38;5;66;03m# Note: this is a change from v2\u001b[39;00m\n\u001b[32m (...)\u001b[39m\u001b[32m 638\u001b[39m **kwargs: Any,\n\u001b[32m 639\u001b[39m ) -> Array:\n\u001b[32m 640\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"Create an array.\u001b[39;00m\n\u001b[32m 641\u001b[39m \n\u001b[32m 642\u001b[39m \u001b[33;03m Parameters\u001b[39;00m\n\u001b[32m (...)\u001b[39m\u001b[32m 714\u001b[39m \u001b[33;03m The array.\u001b[39;00m\n\u001b[32m 715\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m 716\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m Array(\n\u001b[32m--> \u001b[39m\u001b[32m717\u001b[39m \u001b[43msync\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 718\u001b[39m \u001b[43m \u001b[49m\u001b[43masync_api\u001b[49m\u001b[43m.\u001b[49m\u001b[43mcreate\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 719\u001b[39m \u001b[43m \u001b[49m\u001b[43mshape\u001b[49m\u001b[43m=\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 720\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunks\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunks\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 721\u001b[39m \u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 722\u001b[39m \u001b[43m \u001b[49m\u001b[43mcompressor\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcompressor\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 723\u001b[39m \u001b[43m \u001b[49m\u001b[43mfill_value\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfill_value\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 724\u001b[39m \u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[43morder\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 725\u001b[39m \u001b[43m \u001b[49m\u001b[43mstore\u001b[49m\u001b[43m=\u001b[49m\u001b[43mstore\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 726\u001b[39m \u001b[43m \u001b[49m\u001b[43msynchronizer\u001b[49m\u001b[43m=\u001b[49m\u001b[43msynchronizer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 727\u001b[39m \u001b[43m \u001b[49m\u001b[43moverwrite\u001b[49m\u001b[43m=\u001b[49m\u001b[43moverwrite\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 728\u001b[39m \u001b[43m \u001b[49m\u001b[43mpath\u001b[49m\u001b[43m=\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 729\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunk_store\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunk_store\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 730\u001b[39m \u001b[43m \u001b[49m\u001b[43mfilters\u001b[49m\u001b[43m=\u001b[49m\u001b[43mfilters\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 731\u001b[39m \u001b[43m \u001b[49m\u001b[43mcache_metadata\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcache_metadata\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 732\u001b[39m \u001b[43m \u001b[49m\u001b[43mcache_attrs\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcache_attrs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 733\u001b[39m \u001b[43m \u001b[49m\u001b[43mread_only\u001b[49m\u001b[43m=\u001b[49m\u001b[43mread_only\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 734\u001b[39m \u001b[43m \u001b[49m\u001b[43mobject_codec\u001b[49m\u001b[43m=\u001b[49m\u001b[43mobject_codec\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 735\u001b[39m \u001b[43m \u001b[49m\u001b[43mdimension_separator\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdimension_separator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 736\u001b[39m \u001b[43m \u001b[49m\u001b[43mwrite_empty_chunks\u001b[49m\u001b[43m=\u001b[49m\u001b[43mwrite_empty_chunks\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 737\u001b[39m \u001b[43m \u001b[49m\u001b[43mzarr_version\u001b[49m\u001b[43m=\u001b[49m\u001b[43mzarr_version\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 738\u001b[39m \u001b[43m \u001b[49m\u001b[43mzarr_format\u001b[49m\u001b[43m=\u001b[49m\u001b[43mzarr_format\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 739\u001b[39m \u001b[43m \u001b[49m\u001b[43mmeta_array\u001b[49m\u001b[43m=\u001b[49m\u001b[43mmeta_array\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 740\u001b[39m \u001b[43m \u001b[49m\u001b[43mattributes\u001b[49m\u001b[43m=\u001b[49m\u001b[43mattributes\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 741\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunk_shape\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunk_shape\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 742\u001b[39m \u001b[43m \u001b[49m\u001b[43mchunk_key_encoding\u001b[49m\u001b[43m=\u001b[49m\u001b[43mchunk_key_encoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 743\u001b[39m \u001b[43m \u001b[49m\u001b[43mcodecs\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcodecs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 744\u001b[39m \u001b[43m \u001b[49m\u001b[43mdimension_names\u001b[49m\u001b[43m=\u001b[49m\u001b[43mdimension_names\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 745\u001b[39m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m=\u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 746\u001b[39m \u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 747\u001b[39m \u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 748\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 749\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 750\u001b[39m )\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:163\u001b[39m, in \u001b[36msync\u001b[39m\u001b[34m(coro, loop, timeout)\u001b[39m\n\u001b[32m 160\u001b[39m return_result = \u001b[38;5;28mnext\u001b[39m(\u001b[38;5;28miter\u001b[39m(finished)).result()\n\u001b[32m 162\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(return_result, \u001b[38;5;167;01mBaseException\u001b[39;00m):\n\u001b[32m--> \u001b[39m\u001b[32m163\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m return_result\n\u001b[32m 164\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 165\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m return_result\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:119\u001b[39m, in \u001b[36m_runner\u001b[39m\u001b[34m(coro)\u001b[39m\n\u001b[32m 114\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 115\u001b[39m \u001b[33;03mAwait a coroutine and return the result of running it. If awaiting the coroutine raises an\u001b[39;00m\n\u001b[32m 116\u001b[39m \u001b[33;03mexception, the exception will be returned.\u001b[39;00m\n\u001b[32m 117\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 118\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m119\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m coro\n\u001b[32m 120\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m ex:\n\u001b[32m 121\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m ex\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/api/asynchronous.py:1065\u001b[39m, in \u001b[36mcreate\u001b[39m\u001b[34m(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, dimension_separator, write_empty_chunks, zarr_version, zarr_format, meta_array, attributes, chunk_shape, chunk_key_encoding, codecs, dimension_names, storage_options, config, **kwargs)\u001b[39m\n\u001b[32m 1061\u001b[39m warnings.warn(\u001b[38;5;167;01mUserWarning\u001b[39;00m(msg), stacklevel=\u001b[32m1\u001b[39m)\n\u001b[32m 1063\u001b[39m config_parsed = ArrayConfig.from_dict(config_dict)\n\u001b[32m-> \u001b[39m\u001b[32m1065\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m AsyncArray._create(\n\u001b[32m 1066\u001b[39m store_path,\n\u001b[32m 1067\u001b[39m shape=shape,\n\u001b[32m 1068\u001b[39m chunks=chunks,\n\u001b[32m 1069\u001b[39m dtype=dtype,\n\u001b[32m 1070\u001b[39m compressor=compressor,\n\u001b[32m 1071\u001b[39m fill_value=fill_value,\n\u001b[32m 1072\u001b[39m overwrite=overwrite,\n\u001b[32m 1073\u001b[39m filters=filters,\n\u001b[32m 1074\u001b[39m dimension_separator=dimension_separator,\n\u001b[32m 1075\u001b[39m order=order,\n\u001b[32m 1076\u001b[39m zarr_format=zarr_format,\n\u001b[32m 1077\u001b[39m chunk_shape=chunk_shape,\n\u001b[32m 1078\u001b[39m chunk_key_encoding=chunk_key_encoding,\n\u001b[32m 1079\u001b[39m codecs=codecs,\n\u001b[32m 1080\u001b[39m dimension_names=dimension_names,\n\u001b[32m 1081\u001b[39m attributes=attributes,\n\u001b[32m 1082\u001b[39m config=config_parsed,\n\u001b[32m 1083\u001b[39m **kwargs,\n\u001b[32m 1084\u001b[39m )\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/array.py:608\u001b[39m, in \u001b[36mAsyncArray._create\u001b[39m\u001b[34m(cls, store, shape, dtype, zarr_format, fill_value, attributes, chunk_shape, chunk_key_encoding, codecs, dimension_names, chunks, dimension_separator, order, filters, compressor, overwrite, data, config)\u001b[39m\n\u001b[32m 605\u001b[39m _warn_order_kwarg()\n\u001b[32m 606\u001b[39m config_parsed = replace(config_parsed, order=order)\n\u001b[32m--> \u001b[39m\u001b[32m608\u001b[39m result = \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mcls\u001b[39m._create_v3(\n\u001b[32m 609\u001b[39m store_path,\n\u001b[32m 610\u001b[39m shape=shape,\n\u001b[32m 611\u001b[39m dtype=dtype_parsed,\n\u001b[32m 612\u001b[39m chunk_shape=_chunks,\n\u001b[32m 613\u001b[39m fill_value=fill_value,\n\u001b[32m 614\u001b[39m chunk_key_encoding=chunk_key_encoding,\n\u001b[32m 615\u001b[39m codecs=codecs,\n\u001b[32m 616\u001b[39m dimension_names=dimension_names,\n\u001b[32m 617\u001b[39m attributes=attributes,\n\u001b[32m 618\u001b[39m overwrite=overwrite,\n\u001b[32m 619\u001b[39m config=config_parsed,\n\u001b[32m 620\u001b[39m )\n\u001b[32m 621\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m zarr_format == \u001b[32m2\u001b[39m:\n\u001b[32m 622\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m codecs \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/array.py:730\u001b[39m, in \u001b[36mAsyncArray._create_v3\u001b[39m\u001b[34m(cls, store_path, shape, dtype, chunk_shape, config, fill_value, chunk_key_encoding, codecs, dimension_names, attributes, overwrite)\u001b[39m\n\u001b[32m 728\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m ensure_no_existing_node(store_path, zarr_format=\u001b[32m3\u001b[39m)\n\u001b[32m 729\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m730\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m ensure_no_existing_node(store_path, zarr_format=\u001b[32m3\u001b[39m)\n\u001b[32m 732\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(chunk_key_encoding, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[32m 733\u001b[39m chunk_key_encoding = (\n\u001b[32m 734\u001b[39m V2ChunkKeyEncoding(separator=chunk_key_encoding[\u001b[32m1\u001b[39m])\n\u001b[32m 735\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m chunk_key_encoding[\u001b[32m0\u001b[39m] == \u001b[33m\"\u001b[39m\u001b[33mv2\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 736\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m DefaultChunkKeyEncoding(separator=chunk_key_encoding[\u001b[32m1\u001b[39m])\n\u001b[32m 737\u001b[39m )\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/storage/_common.py:414\u001b[39m, in \u001b[36mensure_no_existing_node\u001b[39m\u001b[34m(store_path, zarr_format)\u001b[39m\n\u001b[32m 411\u001b[39m extant_node = \u001b[38;5;28;01mawait\u001b[39;00m _contains_node_v3(store_path)\n\u001b[32m 413\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m extant_node == \u001b[33m\"\u001b[39m\u001b[33marray\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m--> \u001b[39m\u001b[32m414\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m ContainsArrayError(store_path.store, store_path.path)\n\u001b[32m 415\u001b[39m \u001b[38;5;28;01melif\u001b[39;00m extant_node == \u001b[33m\"\u001b[39m\u001b[33mgroup\u001b[39m\u001b[33m\"\u001b[39m:\n\u001b[32m 416\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m ContainsGroupError(store_path.store, store_path.path)\n", - "\u001b[31mContainsArrayError\u001b[39m: An array exists in store LocalStore('file://test.zarr') at path ''." - ] + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ "import zarr\n", - "z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr')\n", + "z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr', mode = 'w')\n", "z" ] }, + { + "cell_type": "markdown", + "id": "03206799", + "metadata": {}, + "source": [ + "`.info` provides a summary of the array's properties, including shape, data type, and compression settings.\n" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "id": "0f39867a", "metadata": {}, "outputs": [ @@ -101,7 +115,7 @@ "No. bytes : 16000 (15.6K)" ] }, - "execution_count": 2, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -112,7 +126,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 18, "id": "dbe47985", "metadata": {}, "outputs": [ @@ -122,7 +136,7 @@ "np.float64(0.0)" ] }, - "execution_count": 3, + "execution_count": 18, "metadata": {}, "output_type": "execute_result" } @@ -141,7 +155,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 19, "id": "7d905f06", "metadata": {}, "outputs": [ @@ -151,7 +165,7 @@ "array(0.)" ] }, - "execution_count": 4, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -170,7 +184,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 25, "id": "1ccc28b6", "metadata": {}, "outputs": [ @@ -186,7 +200,7 @@ " [ 1., 1., 1., ..., 1., 1., 1.]], shape=(40, 50))" ] }, - "execution_count": 5, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } From 9a4e97c684e784f0fd5b582222e7ae9bbadbdbee Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:39:42 -0600 Subject: [PATCH 09/20] more organized --- intermediate/intro-to-zarr.ipynb | 173 ++++++++++++++++++++++++------- 1 file changed, 138 insertions(+), 35 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 3a619044..25644a21 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -217,14 +217,14 @@ "id": "c6a059cc", "metadata": {}, "source": [ - "##### Attributes\n", + "#### Attributes\n", "\n", "We can attach arbitrary metadata to our Array via attributes:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 35, "id": "859c9cfe", "metadata": {}, "outputs": [ @@ -256,7 +256,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 28, "id": "1bbc935c", "metadata": {}, "outputs": [ @@ -266,7 +266,7 @@ "LocalStore('file://test.zarr')" ] }, - "execution_count": 7, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } @@ -277,7 +277,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 30, "id": "51953f01", "metadata": {}, "outputs": [ @@ -294,12 +294,98 @@ "│   │   ├── \u001b[00m3\u001b[0m\n", "│   │   └── \u001b[00m4\u001b[0m\n", "│   ├── \u001b[01;34m1\u001b[0m\n", - "│   │   ├── \u001b[00m0\u001b[0m\n" + "│   │   ├── \u001b[00m0\u001b[0m\n", + "│   │   ├── \u001b[00m1\u001b[0m\n", + "│   │   ├── \u001b[00m2\u001b[0m\n", + "│   │   ├── \u001b[00m3\u001b[0m\n", + "│   │   └── \u001b[00m4\u001b[0m\n", + "│   ├── \u001b[01;34m2\u001b[0m\n", + "│   │   ├── \u001b[00m0\u001b[0m\n", + "│   │   ├── \u001b[00m1\u001b[0m\n", + "│   │   ├── \u001b[00m2\u001b[0m\n", + "│   │   ├── \u001b[00m3\u001b[0m\n", + "│   │   └── \u001b[00m4\u001b[0m\n", + "│   └── \u001b[01;34m3\u001b[0m\n", + "│   ├── \u001b[00m0\u001b[0m\n", + "│   ├── \u001b[00m1\u001b[0m\n", + "│   ├── \u001b[00m2\u001b[0m\n", + "│   ├── \u001b[00m3\u001b[0m\n", + "│   └── \u001b[00m4\u001b[0m\n", + "└── \u001b[00mzarr.json\u001b[0m\n", + "\n", + "6 directories, 21 files\n" ] } ], "source": [ - "!tree -a test.zarr | head" + "!tree -a test.zarr" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "fbc51436", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"shape\": [\n", + " 40,\n", + " 50\n", + " ],\n", + " \"data_type\": \"float64\",\n", + " \"chunk_grid\": {\n", + " \"name\": \"regular\",\n", + " \"configuration\": {\n", + " \"chunk_shape\": [\n", + " 10,\n", + " 10\n", + " ]\n", + " }\n", + " },\n", + " \"chunk_key_encoding\": {\n", + " \"name\": \"default\",\n", + " \"configuration\": {\n", + " \"separator\": \"/\"\n", + " }\n", + " },\n", + " \"fill_value\": 0.0,\n", + " \"codecs\": [\n", + " {\n", + " \"name\": \"bytes\",\n", + " \"configuration\": {\n", + " \"endian\": \"little\"\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"zstd\",\n", + " \"configuration\": {\n", + " \"level\": 0,\n", + " \"checksum\": false\n", + " }\n", + " }\n", + " ],\n", + " \"attributes\": {},\n", + " \"zarr_format\": 3,\n", + " \"node_type\": \"array\",\n", + " \"storage_transformers\": []\n", + "}" + ] + } + ], + "source": [ + "!cat test.zarr/zarr.json" + ] + }, + { + "cell_type": "markdown", + "id": "ead8421e", + "metadata": {}, + "source": [ + "### Hierarchical Groups" ] }, { @@ -307,12 +393,12 @@ "id": "35f5384e", "metadata": {}, "source": [ - "To create groups in your store, use the `create_group` method after creating a root group. Here, we’ll create two groups, `temp` and `precip`." + "Zarr allows you to create hierarchical groups, similar to directories. To create groups in your store, use the `create_group` method after creating a root group. Here, we’ll create two groups, `temp` and `precip`." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 36, "id": "e3c377a0", "metadata": {}, "outputs": [], @@ -335,7 +421,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 37, "id": "eecad1a6", "metadata": {}, "outputs": [ @@ -349,7 +435,7 @@ " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)" ] }, - "execution_count": 10, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } @@ -370,8 +456,33 @@ }, { "cell_type": "code", - "execution_count": null, - "id": "164febda", + "execution_count": 40, + "id": "2f651707", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Name : \n", + "Type : Group\n", + "Zarr format : 3\n", + "Read-only : False\n", + "Store type : MemoryStore" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "root.info" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "2363137a", "metadata": {}, "outputs": [ { @@ -385,7 +496,7 @@ "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'rich'", "\nThe above exception was the direct cause of the following exception:\n", "\u001b[31mImportError\u001b[39m Traceback (most recent call last)", - "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[11]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mroot\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[42]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mroot\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n", "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/group.py:2361\u001b[39m, in \u001b[36mGroup.tree\u001b[39m\u001b[34m(self, expand, level)\u001b[39m\n\u001b[32m 2342\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mtree\u001b[39m(\u001b[38;5;28mself\u001b[39m, expand: \u001b[38;5;28mbool\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m, level: \u001b[38;5;28mint\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m) -> Any:\n\u001b[32m 2343\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 2344\u001b[39m \u001b[33;03m Return a tree-like representation of a hierarchy.\u001b[39;00m\n\u001b[32m 2345\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 2359\u001b[39m \u001b[33;03m A pretty-printable object displaying the hierarchy.\u001b[39;00m\n\u001b[32m 2360\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m2361\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_sync\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_async_group\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m=\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m=\u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n", "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:208\u001b[39m, in \u001b[36mSyncMixin._sync\u001b[39m\u001b[34m(self, coroutine)\u001b[39m\n\u001b[32m 205\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34m_sync\u001b[39m(\u001b[38;5;28mself\u001b[39m, coroutine: Coroutine[Any, Any, T]) -> T:\n\u001b[32m 206\u001b[39m \u001b[38;5;66;03m# TODO: refactor this to to take *args and **kwargs and pass those to the method\u001b[39;00m\n\u001b[32m 207\u001b[39m \u001b[38;5;66;03m# this should allow us to better type the sync wrapper\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m208\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43msync\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 209\u001b[39m \u001b[43m \u001b[49m\u001b[43mcoroutine\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 210\u001b[39m \u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43masync.timeout\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 211\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n", "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:163\u001b[39m, in \u001b[36msync\u001b[39m\u001b[34m(coro, loop, timeout)\u001b[39m\n\u001b[32m 160\u001b[39m return_result = \u001b[38;5;28mnext\u001b[39m(\u001b[38;5;28miter\u001b[39m(finished)).result()\n\u001b[32m 162\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(return_result, \u001b[38;5;167;01mBaseException\u001b[39;00m):\n\u001b[32m--> \u001b[39m\u001b[32m163\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m return_result\n\u001b[32m 164\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 165\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m return_result\n", @@ -397,33 +508,19 @@ } ], "source": [ - "root.tree()\n" + "root.tree(expand=True)\n" ] }, { - "cell_type": "code", - "execution_count": null, - "id": "2363137a", + "cell_type": "markdown", + "id": "a63ebdd7", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Name : \n", - "Type : Group\n", - "Zarr format : 3\n", - "Read-only : False\n", - "Store type : MemoryStore" - ] - }, - "execution_count": 81, - "metadata": {}, - "output_type": "execute_result" - } - ], "source": [ + "### How to Examine and Modify the Chunk Shape\n", "\n", - "root.info" + "If your data is sufficiently large, Zarr will chose a chunksize for you.\n", + "\n", + "\n" ] }, { @@ -539,6 +636,12 @@ "So far we have only been dealing in single array Zarr data stores. In this next example, we will create a zarr store with multiple arrays and then consolidate metadata. The speed up is significant when dealing in remote storage options, which we will see in the following example on accessing cloud storage." ] }, + { + "cell_type": "markdown", + "id": "cdb3f822", + "metadata": {}, + "source": [] + }, { "cell_type": "code", "execution_count": null, From a12435ee933062b6dec408d2fa59d226da1f2e2a Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:53:24 -0600 Subject: [PATCH 10/20] sharding --- intermediate/intro-to-zarr.ipynb | 152 ++++++++++++++++++++++++++++--- 1 file changed, 137 insertions(+), 15 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 25644a21..f9c2b3a3 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -516,13 +516,147 @@ "id": "a63ebdd7", "metadata": {}, "source": [ - "### How to Examine and Modify the Chunk Shape\n", + "#### Chunking\n", + "Chunking is the process of dividing the data arrays into smaller pieces. This allows for parallel processing and efficient storage.\n", "\n", - "If your data is sufficiently large, Zarr will chose a chunksize for you.\n", + "One of the important parameters in Zarr is the chunk shape, which determines how the data is divided into smaller, manageable pieces. This is crucial for performance, especially when working with large datasets.\n", "\n", + "To examine the chunk shape of a Zarr array, you can use the `chunks` attribute. This will show you the size of each chunk in each dimension." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "cd5e7ec0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(10, 10)" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "z.chunks" + ] + }, + { + "cell_type": "markdown", + "id": "a1f62ab3", + "metadata": {}, + "source": [ + "When selecting chunk shapes, we need to keep in mind two constraints:\n", + "\n", + "- Concurrent writes are possible as long as different processes write to separate chunks, enabling highly parallel data writing. \n", + "- When reading data, if any piece of the chunk is needed, the entire chunk has to be loaded. \n", + "\n", + "The optimal chunk shape will depend on how you want to access the data. E.g., for a 2-dimensional array, if you only ever take slices along the first dimension, then chunk across the second dimension.\n", + "\n", + "Here we will compare two different chunking strategies.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "b7929741", + "metadata": {}, + "outputs": [], + "source": [ + "c = zarr.create(shape=(200, 200, 200), chunks=(1, 200, 200), dtype='f8', store='c.zarr')\n", + "c[:] = np.random.randn(*c.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "68d6d671", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 112 ms, sys: 55.4 ms, total: 167 ms\n", + "Wall time: 67.5 ms\n" + ] + } + ], + "source": [ + "%time _ = c[:, 0, 0]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "9ad7e371", + "metadata": {}, + "outputs": [], + "source": [ + "d = zarr.create(shape=(200, 200, 200), chunks=(200, 200, 1), dtype='f8', store='d.zarr')\n", + "d[:] = np.random.randn(*d.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "51094774", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 1.63 ms, sys: 1.3 ms, total: 2.93 ms\n", + "Wall time: 2.14 ms\n" + ] + } + ], + "source": [ + "%time _ = d[:, 0, 0]\n" + ] + }, + { + "cell_type": "markdown", + "id": "3fa2b41a", + "metadata": {}, + "source": [ + "### Sharding\n", + "When working with large arrays and small chunks, Zarr’s sharding feature can improve storage efficiency and performance. Instead of writing each chunk to a separate file—which can overwhelm file systems and cloud object stores—sharding groups multiple chunks into a single storage object.\n", + "\n", + "Why Use Sharding?\n", + "\n", + "- File systems struggle with too many small files.\n", + "- Small files (e.g., 1 MB or less) may waste space due to filesystem block size.\n", + "- Object storage systems (e.g., S3) can slow down with a high number of objects.\n", + "With sharding, you choose:\n", "\n" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fec37ba", + "metadata": {}, + "outputs": [], + "source": [ + "import zarr\n", + "\n", + "z6 = zarr.create_array(\n", + " store={},\n", + " shape=(10000, 10000, 1000),\n", + " chunks=(100, 100, 100),\n", + " shards=(1000, 1000, 1000),\n", + " dtype='uint8'\n", + ")\n", + "\n", + "z6.info" + ] + }, { "cell_type": "markdown", "id": "1e0d1a8e", @@ -636,12 +770,6 @@ "So far we have only been dealing in single array Zarr data stores. In this next example, we will create a zarr store with multiple arrays and then consolidate metadata. The speed up is significant when dealing in remote storage options, which we will see in the following example on accessing cloud storage." ] }, - { - "cell_type": "markdown", - "id": "cdb3f822", - "metadata": {}, - "source": [] - }, { "cell_type": "code", "execution_count": null, @@ -748,7 +876,7 @@ "id": "c6454acf", "metadata": {}, "source": [ - "## Object Storage as a Zarr Store\n", + "### Object Storage as a Zarr Store\n", "\n", "Zarr’s layout (many files/chunks per array) maps perfectly onto object storage, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. Each chunk is stored as a separate object, enabling distributed reads/writes.\n", "\n" @@ -2294,12 +2422,6 @@ "- [Cloud Optimized Geospatial Formats](https://guide.cloudnativegeo.org/zarr/zarr-in-practice.html)\n", "- [Scalable and Computationally Reproducible Approaches to Arctic Research](https://learning.nceas.ucsb.edu/2025-04-arctic/sections/zarr.html)\n" ] - }, - { - "cell_type": "markdown", - "id": "5ed9088a", - "metadata": {}, - "source": [] } ], "metadata": { From ab1d366be690d7c315529eac3c50a370bce9ac96 Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:55:47 -0600 Subject: [PATCH 11/20] sharding added --- intermediate/intro-to-zarr.ipynb | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index f9c2b3a3..fe0f1cfd 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -637,6 +637,14 @@ "\n" ] }, + { + "cell_type": "markdown", + "id": "139134af", + "metadata": {}, + "source": [ + "This example shows how to create a sharded Zarr array with a chunk size of `(100, 100, 100)` and a shard size of `(1000, 1000, 1000)`. This means that each shard will contain 10 chunks, and each chunk will be of size `(100, 100, 100)`.\n" + ] + }, { "cell_type": "code", "execution_count": null, @@ -657,6 +665,17 @@ "z6.info" ] }, + { + "cell_type": "markdown", + "id": "28877070", + "metadata": {}, + "source": [ + "\n", + "```{tip}\n", + "Choose shard and chunk sizes that balance I/O performance and manageability for your filesystem or cloud backend.\n", + "```" + ] + }, { "cell_type": "markdown", "id": "1e0d1a8e", From 59fe17b130ebe9d58e173bee37339a075ae88b26 Mon Sep 17 00:00:00 2001 From: Negin Sobhani Date: Fri, 4 Jul 2025 14:57:01 -0600 Subject: [PATCH 12/20] zarr added --- intermediate/intro-to-zarr.ipynb | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index fe0f1cfd..aba040ce 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -2439,8 +2439,15 @@ "\n", "- [Zarr Documentation](https://zarr.readthedocs.io/en/stable/)\n", "- [Cloud Optimized Geospatial Formats](https://guide.cloudnativegeo.org/zarr/zarr-in-practice.html)\n", - "- [Scalable and Computationally Reproducible Approaches to Arctic Research](https://learning.nceas.ucsb.edu/2025-04-arctic/sections/zarr.html)\n" + "- [Scalable and Computationally Reproducible Approaches to Arctic Research](https://learning.nceas.ucsb.edu/2025-04-arctic/sections/zarr.html)\n", + "- [Zarr Cloud Native Geospatial Tutorial](https://github.com/zarr-developers/tutorials/blob/main/zarr_cloud_native_geospatial_2022.ipynb)" ] + }, + { + "cell_type": "markdown", + "id": "2cd8bea6", + "metadata": {}, + "source": [] } ], "metadata": { From 5856441919f8ed7f92a6099ec293bd2de6fa6fc0 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 4 Jul 2025 21:07:19 +0000 Subject: [PATCH 13/20] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- intermediate/indexing/advanced-indexing.ipynb | 8 +- intermediate/intro-to-zarr.ipynb | 2036 +---------------- 2 files changed, 108 insertions(+), 1936 deletions(-) diff --git a/intermediate/indexing/advanced-indexing.ipynb b/intermediate/indexing/advanced-indexing.ipynb index 0a0cc147..27de401e 100644 --- a/intermediate/indexing/advanced-indexing.ipynb +++ b/intermediate/indexing/advanced-indexing.ipynb @@ -414,11 +414,6 @@ } ], "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -428,8 +423,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.4" + "pygments_lexer": "ipython3" }, "toc": { "base_numbering": 1, diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index aba040ce..d8ab5ac8 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "8253fe2d", + "id": "0", "metadata": {}, "source": [ "# Introduction to Zarr\n", @@ -40,7 +40,7 @@ }, { "cell_type": "markdown", - "id": "89a8f0ec", + "id": "1", "metadata": { "vscode": { "languageId": "plaintext" @@ -62,30 +62,20 @@ }, { "cell_type": "code", - "execution_count": 16, - "id": "ae9c38ed", + "execution_count": null, + "id": "2", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import zarr\n", - "z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr', mode = 'w')\n", + "\n", + "z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr', mode='w')\n", "z" ] }, { "cell_type": "markdown", - "id": "03206799", + "id": "3", "metadata": {}, "source": [ "`.info` provides a summary of the array's properties, including shape, data type, and compression settings.\n" @@ -93,61 +83,27 @@ }, { "cell_type": "code", - "execution_count": 17, - "id": "0f39867a", + "execution_count": null, + "id": "4", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Type : Array\n", - "Zarr format : 3\n", - "Data type : DataType.float64\n", - "Fill value : 0.0\n", - "Shape : (40, 50)\n", - "Chunk shape : (10, 10)\n", - "Order : C\n", - "Read-only : False\n", - "Store type : LocalStore\n", - "Filters : ()\n", - "Serializer : BytesCodec(endian=)\n", - "Compressors : (ZstdCodec(level=0, checksum=False),)\n", - "No. bytes : 16000 (15.6K)" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "z.info" ] }, { "cell_type": "code", - "execution_count": 18, - "id": "dbe47985", + "execution_count": null, + "id": "5", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "np.float64(0.0)" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "z.fill_value" ] }, { "cell_type": "markdown", - "id": "f5dcee68", + "id": "6", "metadata": {}, "source": [ "No data has been written to the array yet. If we try to access the data, we will get a fill value: " @@ -155,28 +111,17 @@ }, { "cell_type": "code", - "execution_count": 19, - "id": "7d905f06", + "execution_count": null, + "id": "7", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array(0.)" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "z[0, 0]\n" + "z[0, 0]" ] }, { "cell_type": "markdown", - "id": "a6091ba5", + "id": "8", "metadata": {}, "source": [ "This is how we assign data to the array. When we do this it gets written immediately." @@ -184,29 +129,13 @@ }, { "cell_type": "code", - "execution_count": 25, - "id": "1ccc28b6", + "execution_count": null, + "id": "9", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[ 0., 1., 2., ..., 47., 48., 49.],\n", - " [ 1., 1., 1., ..., 1., 1., 1.],\n", - " [ 1., 1., 1., ..., 1., 1., 1.],\n", - " ...,\n", - " [ 1., 1., 1., ..., 1., 1., 1.],\n", - " [ 1., 1., 1., ..., 1., 1., 1.],\n", - " [ 1., 1., 1., ..., 1., 1., 1.]], shape=(40, 50))" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import numpy as np\n", + "\n", "z[:] = 1\n", "z[0, :] = np.arange(50)\n", "z[:]" @@ -214,7 +143,7 @@ }, { "cell_type": "markdown", - "id": "c6a059cc", + "id": "10", "metadata": {}, "source": [ "#### Attributes\n", @@ -224,18 +153,10 @@ }, { "cell_type": "code", - "execution_count": 35, - "id": "859c9cfe", + "execution_count": null, + "id": "11", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'units': 'm/s', 'standard_name': 'wind_speed'}\n" - ] - } - ], + "outputs": [], "source": [ "z.attrs['units'] = 'm/s'\n", "z.attrs['standard_name'] = 'wind_speed'\n", @@ -244,7 +165,7 @@ }, { "cell_type": "markdown", - "id": "23885ea0", + "id": "12", "metadata": {}, "source": [ "### Zarr Data Storage\n", @@ -256,133 +177,37 @@ }, { "cell_type": "code", - "execution_count": 28, - "id": "1bbc935c", + "execution_count": null, + "id": "13", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "LocalStore('file://test.zarr')" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "z.store" ] }, { "cell_type": "code", - "execution_count": 30, - "id": "51953f01", + "execution_count": null, + "id": "14", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[01;34mtest.zarr\u001b[0m\n", - "├── \u001b[01;34mc\u001b[0m\n", - "│   ├── \u001b[01;34m0\u001b[0m\n", - "│   │   ├── \u001b[00m0\u001b[0m\n", - "│   │   ├── \u001b[00m1\u001b[0m\n", - "│   │   ├── \u001b[00m2\u001b[0m\n", - "│   │   ├── \u001b[00m3\u001b[0m\n", - "│   │   └── \u001b[00m4\u001b[0m\n", - "│   ├── \u001b[01;34m1\u001b[0m\n", - "│   │   ├── \u001b[00m0\u001b[0m\n", - "│   │   ├── \u001b[00m1\u001b[0m\n", - "│   │   ├── \u001b[00m2\u001b[0m\n", - "│   │   ├── \u001b[00m3\u001b[0m\n", - "│   │   └── \u001b[00m4\u001b[0m\n", - "│   ├── \u001b[01;34m2\u001b[0m\n", - "│   │   ├── \u001b[00m0\u001b[0m\n", - "│   │   ├── \u001b[00m1\u001b[0m\n", - "│   │   ├── \u001b[00m2\u001b[0m\n", - "│   │   ├── \u001b[00m3\u001b[0m\n", - "│   │   └── \u001b[00m4\u001b[0m\n", - "│   └── \u001b[01;34m3\u001b[0m\n", - "│   ├── \u001b[00m0\u001b[0m\n", - "│   ├── \u001b[00m1\u001b[0m\n", - "│   ├── \u001b[00m2\u001b[0m\n", - "│   ├── \u001b[00m3\u001b[0m\n", - "│   └── \u001b[00m4\u001b[0m\n", - "└── \u001b[00mzarr.json\u001b[0m\n", - "\n", - "6 directories, 21 files\n" - ] - } - ], + "outputs": [], "source": [ "!tree -a test.zarr" ] }, { "cell_type": "code", - "execution_count": 34, - "id": "fbc51436", + "execution_count": null, + "id": "15", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\n", - " \"shape\": [\n", - " 40,\n", - " 50\n", - " ],\n", - " \"data_type\": \"float64\",\n", - " \"chunk_grid\": {\n", - " \"name\": \"regular\",\n", - " \"configuration\": {\n", - " \"chunk_shape\": [\n", - " 10,\n", - " 10\n", - " ]\n", - " }\n", - " },\n", - " \"chunk_key_encoding\": {\n", - " \"name\": \"default\",\n", - " \"configuration\": {\n", - " \"separator\": \"/\"\n", - " }\n", - " },\n", - " \"fill_value\": 0.0,\n", - " \"codecs\": [\n", - " {\n", - " \"name\": \"bytes\",\n", - " \"configuration\": {\n", - " \"endian\": \"little\"\n", - " }\n", - " },\n", - " {\n", - " \"name\": \"zstd\",\n", - " \"configuration\": {\n", - " \"level\": 0,\n", - " \"checksum\": false\n", - " }\n", - " }\n", - " ],\n", - " \"attributes\": {},\n", - " \"zarr_format\": 3,\n", - " \"node_type\": \"array\",\n", - " \"storage_transformers\": []\n", - "}" - ] - } - ], + "outputs": [], "source": [ "!cat test.zarr/zarr.json" ] }, { "cell_type": "markdown", - "id": "ead8421e", + "id": "16", "metadata": {}, "source": [ "### Hierarchical Groups" @@ -390,7 +215,7 @@ }, { "cell_type": "markdown", - "id": "35f5384e", + "id": "17", "metadata": {}, "source": [ "Zarr allows you to create hierarchical groups, similar to directories. To create groups in your store, use the `create_group` method after creating a root group. Here, we’ll create two groups, `temp` and `precip`." @@ -398,21 +223,21 @@ }, { "cell_type": "code", - "execution_count": 36, - "id": "e3c377a0", + "execution_count": null, + "id": "18", "metadata": {}, "outputs": [], "source": [ "root = zarr.group()\n", "temp = root.create_group('temp')\n", "precip = root.create_group('precip')\n", - "t2m = temp.create_array('t2m', shape=(100,100), chunks=(10,10), dtype='i4')\n", - "prcp = precip.create_array('prcp', shape=(1000,1000), chunks=(10,10), dtype='i4')" + "t2m = temp.create_array('t2m', shape=(100, 100), chunks=(10, 10), dtype='i4')\n", + "prcp = precip.create_array('prcp', shape=(1000, 1000), chunks=(10, 10), dtype='i4')" ] }, { "cell_type": "markdown", - "id": "45fb5bc2", + "id": "19", "metadata": {}, "source": [ "Groups can easily be accessed by name and index.\n", @@ -421,25 +246,10 @@ }, { "cell_type": "code", - "execution_count": 37, - "id": "eecad1a6", + "execution_count": null, + "id": "20", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", - " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", - " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", - " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", - " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "root['temp']\n", "root['temp/t2m'][:, 3]" @@ -447,7 +257,7 @@ }, { "cell_type": "markdown", - "id": "f52c70ba", + "id": "21", "metadata": {}, "source": [ "To get a look at your overall dataset, the `tree` and `info` methods are helpful.\n", @@ -456,64 +266,27 @@ }, { "cell_type": "code", - "execution_count": 40, - "id": "2f651707", + "execution_count": null, + "id": "22", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Name : \n", - "Type : Group\n", - "Zarr format : 3\n", - "Read-only : False\n", - "Store type : MemoryStore" - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "root.info" ] }, { "cell_type": "code", - "execution_count": 42, - "id": "2363137a", + "execution_count": null, + "id": "23", "metadata": {}, - "outputs": [ - { - "ename": "ImportError", - "evalue": "'rich' is required for Group.tree", - "output_type": "error", - "traceback": [ - "\u001b[31m---------------------------------------------------------------------------\u001b[39m", - "\u001b[31mModuleNotFoundError\u001b[39m Traceback (most recent call last)", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/_tree.py:9\u001b[39m\n\u001b[32m 8\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m----> \u001b[39m\u001b[32m9\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mrich\u001b[39;00m\n\u001b[32m 10\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mrich\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mconsole\u001b[39;00m\n", - "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'rich'", - "\nThe above exception was the direct cause of the following exception:\n", - "\u001b[31mImportError\u001b[39m Traceback (most recent call last)", - "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[42]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m \u001b[43mroot\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/group.py:2361\u001b[39m, in \u001b[36mGroup.tree\u001b[39m\u001b[34m(self, expand, level)\u001b[39m\n\u001b[32m 2342\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mtree\u001b[39m(\u001b[38;5;28mself\u001b[39m, expand: \u001b[38;5;28mbool\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m, level: \u001b[38;5;28mint\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m) -> Any:\n\u001b[32m 2343\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 2344\u001b[39m \u001b[33;03m Return a tree-like representation of a hierarchy.\u001b[39;00m\n\u001b[32m 2345\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 2359\u001b[39m \u001b[33;03m A pretty-printable object displaying the hierarchy.\u001b[39;00m\n\u001b[32m 2360\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m2361\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_sync\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_async_group\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtree\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m=\u001b[49m\u001b[43mexpand\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m=\u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:208\u001b[39m, in \u001b[36mSyncMixin._sync\u001b[39m\u001b[34m(self, coroutine)\u001b[39m\n\u001b[32m 205\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34m_sync\u001b[39m(\u001b[38;5;28mself\u001b[39m, coroutine: Coroutine[Any, Any, T]) -> T:\n\u001b[32m 206\u001b[39m \u001b[38;5;66;03m# TODO: refactor this to to take *args and **kwargs and pass those to the method\u001b[39;00m\n\u001b[32m 207\u001b[39m \u001b[38;5;66;03m# this should allow us to better type the sync wrapper\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m208\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43msync\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 209\u001b[39m \u001b[43m \u001b[49m\u001b[43mcoroutine\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 210\u001b[39m \u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[43m=\u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m.\u001b[49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43masync.timeout\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 211\u001b[39m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:163\u001b[39m, in \u001b[36msync\u001b[39m\u001b[34m(coro, loop, timeout)\u001b[39m\n\u001b[32m 160\u001b[39m return_result = \u001b[38;5;28mnext\u001b[39m(\u001b[38;5;28miter\u001b[39m(finished)).result()\n\u001b[32m 162\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(return_result, \u001b[38;5;167;01mBaseException\u001b[39;00m):\n\u001b[32m--> \u001b[39m\u001b[32m163\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m return_result\n\u001b[32m 164\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 165\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m return_result\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/sync.py:119\u001b[39m, in \u001b[36m_runner\u001b[39m\u001b[34m(coro)\u001b[39m\n\u001b[32m 114\u001b[39m \u001b[38;5;250m\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 115\u001b[39m \u001b[33;03mAwait a coroutine and return the result of running it. If awaiting the coroutine raises an\u001b[39;00m\n\u001b[32m 116\u001b[39m \u001b[33;03mexception, the exception will be returned.\u001b[39;00m\n\u001b[32m 117\u001b[39m \u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 118\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m119\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m coro\n\u001b[32m 120\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m ex:\n\u001b[32m 121\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m ex\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/group.py:1583\u001b[39m, in \u001b[36mAsyncGroup.tree\u001b[39m\u001b[34m(self, expand, level)\u001b[39m\n\u001b[32m 1564\u001b[39m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mtree\u001b[39m(\u001b[38;5;28mself\u001b[39m, expand: \u001b[38;5;28mbool\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m, level: \u001b[38;5;28mint\u001b[39m | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m) -> Any:\n\u001b[32m 1565\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 1566\u001b[39m \u001b[33;03m Return a tree-like representation of a hierarchy.\u001b[39;00m\n\u001b[32m 1567\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 1581\u001b[39m \u001b[33;03m A pretty-printable object displaying the hierarchy.\u001b[39;00m\n\u001b[32m 1582\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1583\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mzarr\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mcore\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01m_tree\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m group_tree_async\n\u001b[32m 1585\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m expand \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m 1586\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33m'\u001b[39m\u001b[33mexpand\u001b[39m\u001b[33m'\u001b[39m\u001b[33m is not yet implemented.\u001b[39m\u001b[33m\"\u001b[39m)\n", - "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/core/_tree.py:13\u001b[39m\n\u001b[32m 11\u001b[39m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mrich\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mtree\u001b[39;00m\n\u001b[32m 12\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[32m---> \u001b[39m\u001b[32m13\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[33m\"\u001b[39m\u001b[33m'\u001b[39m\u001b[33mrich\u001b[39m\u001b[33m'\u001b[39m\u001b[33m is required for Group.tree\u001b[39m\u001b[33m\"\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01me\u001b[39;00m\n\u001b[32m 16\u001b[39m \u001b[38;5;28;01mclass\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mTreeRepr\u001b[39;00m:\n\u001b[32m 17\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 18\u001b[39m \u001b[33;03m A simple object with a tree-like repr for the Zarr Group.\u001b[39;00m\n\u001b[32m 19\u001b[39m \n\u001b[32m 20\u001b[39m \u001b[33;03m Note that this object and it's implementation isn't considered part\u001b[39;00m\n\u001b[32m 21\u001b[39m \u001b[33;03m of Zarr's public API.\u001b[39;00m\n\u001b[32m 22\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n", - "\u001b[31mImportError\u001b[39m: 'rich' is required for Group.tree" - ] - } - ], + "outputs": [], "source": [ - "root.tree(expand=True)\n" + "root.tree(expand=True)" ] }, { "cell_type": "markdown", - "id": "a63ebdd7", + "id": "24", "metadata": {}, "source": [ "#### Chunking\n", @@ -526,28 +299,17 @@ }, { "cell_type": "code", - "execution_count": 44, - "id": "cd5e7ec0", + "execution_count": null, + "id": "25", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(10, 10)" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "z.chunks" ] }, { "cell_type": "markdown", - "id": "a1f62ab3", + "id": "26", "metadata": {}, "source": [ "When selecting chunk shapes, we need to keep in mind two constraints:\n", @@ -562,8 +324,8 @@ }, { "cell_type": "code", - "execution_count": 46, - "id": "b7929741", + "execution_count": null, + "id": "27", "metadata": {}, "outputs": [], "source": [ @@ -573,27 +335,18 @@ }, { "cell_type": "code", - "execution_count": 47, - "id": "68d6d671", + "execution_count": null, + "id": "28", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "CPU times: user 112 ms, sys: 55.4 ms, total: 167 ms\n", - "Wall time: 67.5 ms\n" - ] - } - ], + "outputs": [], "source": [ - "%time _ = c[:, 0, 0]\n" + "%time _ = c[:, 0, 0]" ] }, { "cell_type": "code", - "execution_count": 48, - "id": "9ad7e371", + "execution_count": null, + "id": "29", "metadata": {}, "outputs": [], "source": [ @@ -603,26 +356,17 @@ }, { "cell_type": "code", - "execution_count": 49, - "id": "51094774", + "execution_count": null, + "id": "30", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "CPU times: user 1.63 ms, sys: 1.3 ms, total: 2.93 ms\n", - "Wall time: 2.14 ms\n" - ] - } - ], + "outputs": [], "source": [ - "%time _ = d[:, 0, 0]\n" + "%time _ = d[:, 0, 0]" ] }, { "cell_type": "markdown", - "id": "3fa2b41a", + "id": "31", "metadata": {}, "source": [ "### Sharding\n", @@ -639,7 +383,7 @@ }, { "cell_type": "markdown", - "id": "139134af", + "id": "32", "metadata": {}, "source": [ "This example shows how to create a sharded Zarr array with a chunk size of `(100, 100, 100)` and a shard size of `(1000, 1000, 1000)`. This means that each shard will contain 10 chunks, and each chunk will be of size `(100, 100, 100)`.\n" @@ -648,7 +392,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4fec37ba", + "id": "33", "metadata": {}, "outputs": [], "source": [ @@ -659,7 +403,7 @@ " shape=(10000, 10000, 1000),\n", " chunks=(100, 100, 100),\n", " shards=(1000, 1000, 1000),\n", - " dtype='uint8'\n", + " dtype='uint8',\n", ")\n", "\n", "z6.info" @@ -667,7 +411,7 @@ }, { "cell_type": "markdown", - "id": "28877070", + "id": "34", "metadata": {}, "source": [ "\n", @@ -678,7 +422,7 @@ }, { "cell_type": "markdown", - "id": "1e0d1a8e", + "id": "35", "metadata": {}, "source": [ "#### Compressors\n", @@ -690,27 +434,16 @@ { "cell_type": "code", "execution_count": null, - "id": "5263951c", + "id": "36", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(ZstdCodec(level=0, checksum=False),)" - ] - }, - "execution_count": 29, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "z.compressors" ] }, { "cell_type": "markdown", - "id": "b948f73c", + "id": "37", "metadata": {}, "source": [ "If you don’t specify a compressor, by default Zarr uses the Zstandard compressor." @@ -718,7 +451,7 @@ }, { "cell_type": "markdown", - "id": "75d91cf7", + "id": "38", "metadata": {}, "source": [ "How much space was saved by compression?\n" @@ -727,42 +460,16 @@ { "cell_type": "code", "execution_count": null, - "id": "49bbc63e", + "id": "39", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Type : Array\n", - "Zarr format : 3\n", - "Data type : DataType.float64\n", - "Fill value : 0.0\n", - "Shape : (40, 50)\n", - "Chunk shape : (10, 10)\n", - "Order : C\n", - "Read-only : False\n", - "Store type : LocalStore\n", - "Filters : ()\n", - "Serializer : BytesCodec(endian=)\n", - "Compressors : (ZstdCodec(level=0, checksum=False),)\n", - "No. bytes : 16000 (15.6K)\n", - "No. bytes stored : 1216\n", - "Storage ratio : 13.2\n", - "Chunks Initialized : 20" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "z.info_complete()" ] }, { "cell_type": "markdown", - "id": "4b33663a", + "id": "40", "metadata": {}, "source": [ "You can set `compression=None` when creating a Zarr array to turn off compression. This is useful for debugging or when you want to store data without compression." @@ -770,7 +477,7 @@ }, { "cell_type": "markdown", - "id": "388d7c50", + "id": "41", "metadata": {}, "source": [ "```{info}\n", @@ -780,7 +487,7 @@ }, { "cell_type": "markdown", - "id": "cd94a896", + "id": "42", "metadata": {}, "source": [ "#### Consolidated Metadata\n", @@ -792,28 +499,9 @@ { "cell_type": "code", "execution_count": null, - "id": "8498eccc", + "id": "43", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/negins/miniconda/envs/zarr_tutorial/lib/python3.13/site-packages/zarr/api/asynchronous.py:227: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.\n", - " warnings.warn(\n" - ] - }, - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "store = zarr.storage.MemoryStore()\n", "group = zarr.create_group(store=store)\n", @@ -825,7 +513,7 @@ }, { "cell_type": "markdown", - "id": "7b8d557b", + "id": "44", "metadata": {}, "source": [ "Now, if we open that group, the Group’s metadata has a zarr.core.group.ConsolidatedMetadata that can be used:" @@ -834,65 +522,20 @@ { "cell_type": "code", "execution_count": null, - "id": "57c688fc", + "id": "45", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'a': ArrayV3Metadata(shape=(1,),\n", - " data_type=,\n", - " chunk_grid=RegularChunkGrid(chunk_shape=(1,)),\n", - " chunk_key_encoding=DefaultChunkKeyEncoding(name='default',\n", - " separator='/'),\n", - " fill_value=np.float64(0.0),\n", - " codecs=(BytesCodec(endian=),\n", - " ZstdCodec(level=0, checksum=False)),\n", - " attributes={},\n", - " dimension_names=None,\n", - " zarr_format=3,\n", - " node_type='array',\n", - " storage_transformers=()),\n", - " 'b': ArrayV3Metadata(shape=(2, 2),\n", - " data_type=,\n", - " chunk_grid=RegularChunkGrid(chunk_shape=(2, 2)),\n", - " chunk_key_encoding=DefaultChunkKeyEncoding(name='default',\n", - " separator='/'),\n", - " fill_value=np.float64(0.0),\n", - " codecs=(BytesCodec(endian=),\n", - " ZstdCodec(level=0, checksum=False)),\n", - " attributes={},\n", - " dimension_names=None,\n", - " zarr_format=3,\n", - " node_type='array',\n", - " storage_transformers=()),\n", - " 'c': ArrayV3Metadata(shape=(3, 3, 3),\n", - " data_type=,\n", - " chunk_grid=RegularChunkGrid(chunk_shape=(3, 3, 3)),\n", - " chunk_key_encoding=DefaultChunkKeyEncoding(name='default',\n", - " separator='/'),\n", - " fill_value=np.float64(0.0),\n", - " codecs=(BytesCodec(endian=),\n", - " ZstdCodec(level=0, checksum=False)),\n", - " attributes={},\n", - " dimension_names=None,\n", - " zarr_format=3,\n", - " node_type='array',\n", - " storage_transformers=())}\n" - ] - } - ], + "outputs": [], "source": [ "consolidated = zarr.open_group(store=store)\n", "consolidated_metadata = consolidated.metadata.consolidated_metadata.metadata\n", "from pprint import pprint\n", + "\n", "pprint(dict(sorted(consolidated_metadata.items())))" ] }, { "cell_type": "markdown", - "id": "c6454acf", + "id": "46", "metadata": {}, "source": [ "### Object Storage as a Zarr Store\n", @@ -903,7 +546,7 @@ }, { "cell_type": "markdown", - "id": "5eb5ff8b", + "id": "47", "metadata": {}, "source": [ "Here are some examples of Zarr stores on the cloud:\n", @@ -916,7 +559,7 @@ }, { "cell_type": "markdown", - "id": "18c6f915", + "id": "48", "metadata": {}, "source": [ "### Xarray and Zarr\n", @@ -927,845 +570,10 @@ }, { "cell_type": "code", - "execution_count": 15, - "id": "dba10f47", + "execution_count": null, + "id": "49", "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
<xarray.Dataset> Size: 2GB\n",
-       "Dimensions:      (latitude: 180, nv: 2, longitude: 360, time: 9226)\n",
-       "Coordinates:\n",
-       "    lat_bounds   (latitude, nv) float32 1kB dask.array<chunksize=(180, 2), meta=np.ndarray>\n",
-       "  * latitude     (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n",
-       "    lon_bounds   (longitude, nv) float32 3kB dask.array<chunksize=(360, 2), meta=np.ndarray>\n",
-       "  * longitude    (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
-       "  * time         (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n",
-       "    time_bounds  (time, nv) datetime64[ns] 148kB dask.array<chunksize=(200, 2), meta=np.ndarray>\n",
-       "Dimensions without coordinates: nv\n",
-       "Data variables:\n",
-       "    precip       (time, latitude, longitude) float32 2GB dask.array<chunksize=(200, 180, 360), meta=np.ndarray>\n",
-       "Attributes: (12/45)\n",
-       "    Conventions:                CF-1.6, ACDD 1.3\n",
-       "    Metadata_Conventions:       CF-1.6, Unidata Dataset Discovery v1.0, NOAA ...\n",
-       "    acknowledgment:             This project was supported in part by a grant...\n",
-       "    cdm_data_type:              Grid\n",
-       "    cdr_program:                NOAA Climate Data Record Program for satellit...\n",
-       "    cdr_variable:               precipitation\n",
-       "    ...                         ...\n",
-       "    standard_name_vocabulary:   CF Standard Name Table (v41, 22 February 2017)\n",
-       "    summary:                    Global Precipitation Climatology Project (GPC...\n",
-       "    time_coverage_duration:     P1D\n",
-       "    time_coverage_end:          1996-10-01T23:59:59Z\n",
-       "    time_coverage_start:        1996-10-01T00:00:00Z\n",
-       "    title:                      Global Precipitation Climatatology Project (G...
" - ], - "text/plain": [ - " Size: 2GB\n", - "Dimensions: (latitude: 180, nv: 2, longitude: 360, time: 9226)\n", - "Coordinates:\n", - " lat_bounds (latitude, nv) float32 1kB dask.array\n", - " * latitude (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n", - " lon_bounds (longitude, nv) float32 3kB dask.array\n", - " * longitude (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n", - " * time (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n", - " time_bounds (time, nv) datetime64[ns] 148kB dask.array\n", - "Dimensions without coordinates: nv\n", - "Data variables:\n", - " precip (time, latitude, longitude) float32 2GB dask.array\n", - "Attributes: (12/45)\n", - " Conventions: CF-1.6, ACDD 1.3\n", - " Metadata_Conventions: CF-1.6, Unidata Dataset Discovery v1.0, NOAA ...\n", - " acknowledgment: This project was supported in part by a grant...\n", - " cdm_data_type: Grid\n", - " cdr_program: NOAA Climate Data Record Program for satellit...\n", - " cdr_variable: precipitation\n", - " ... ...\n", - " standard_name_vocabulary: CF Standard Name Table (v41, 22 February 2017)\n", - " summary: Global Precipitation Climatology Project (GPC...\n", - " time_coverage_duration: P1D\n", - " time_coverage_end: 1996-10-01T23:59:59Z\n", - " time_coverage_start: 1996-10-01T00:00:00Z\n", - " title: Global Precipitation Climatatology Project (G..." - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr'\n", "\n", @@ -1775,641 +583,17 @@ }, { "cell_type": "code", - "execution_count": 13, - "id": "9d48c39e", + "execution_count": null, + "id": "50", "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
<xarray.DataArray 'precip' (time: 9226, latitude: 180, longitude: 360)> Size: 2GB\n",
-       "dask.array<open_dataset-precip, shape=(9226, 180, 360), dtype=float32, chunksize=(200, 180, 360), chunktype=numpy.ndarray>\n",
-       "Coordinates:\n",
-       "  * latitude   (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n",
-       "  * longitude  (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n",
-       "  * time       (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n",
-       "Attributes:\n",
-       "    cell_methods:   area: mean time: mean\n",
-       "    long_name:      NOAA Climate Data Record (CDR) of Daily GPCP Satellite-Ga...\n",
-       "    standard_name:  lwe_precipitation_rate\n",
-       "    units:          mm/day\n",
-       "    valid_range:    [0.0, 100.0]
" - ], - "text/plain": [ - " Size: 2GB\n", - "dask.array\n", - "Coordinates:\n", - " * latitude (latitude) float32 720B -90.0 -89.0 -88.0 ... 87.0 88.0 89.0\n", - " * longitude (longitude) float32 1kB 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0\n", - " * time (time) datetime64[ns] 74kB 1996-10-01 1996-10-02 ... 2021-12-31\n", - "Attributes:\n", - " cell_methods: area: mean time: mean\n", - " long_name: NOAA Climate Data Record (CDR) of Daily GPCP Satellite-Ga...\n", - " standard_name: lwe_precipitation_rate\n", - " units: mm/day\n", - " valid_range: [0.0, 100.0]" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "ds.precip" ] }, { "cell_type": "markdown", - "id": "76756fb5", + "id": "51", "metadata": {}, "source": [ "::::{admonition} Exercise\n", @@ -2430,7 +614,7 @@ }, { "cell_type": "markdown", - "id": "23a1ef1a", + "id": "52", "metadata": {}, "source": [ "In the next exercise, you will use the Xarray + Zarr to open CMIP6 dataset.\n", @@ -2445,17 +629,12 @@ }, { "cell_type": "markdown", - "id": "2cd8bea6", + "id": "53", "metadata": {}, "source": [] } ], "metadata": { - "kernelspec": { - "display_name": "zarr_tutorial", - "language": "python", - "name": "python3" - }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -2465,8 +644,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.13.5" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From 1674e45c74495415d26f5256f03c7a669b1839b5 Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Sun, 6 Jul 2025 16:16:30 -0700 Subject: [PATCH 14/20] minor updates to zarr tutorial --- intermediate/intro-to-zarr.ipynb | 78 ++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 25 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index d8ab5ac8..43a1c0f1 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -9,14 +9,14 @@ "\n", "## Learning Objectives:\n", "\n", - "- Understand the principles of the Zarr file format\n", - "- Learn how to read and write Zarr files using the `zarr-python` library\n", - "- Explore how to use Zarr files with `xarray` for data analysis and visualization\n", + "- Understand the principles of the Zarr data format\n", + "- Learn how to read and write Zarr stores using the `zarr-python` library\n", + "- Explore how to use Zarr stores with `xarray` for data analysis and visualization\n", "\n", "This notebook provides a brief introduction to Zarr and how to\n", "use it in cloud environments for scalable, chunked, and compressed data storage.\n", "\n", - "Zarr is a file format with implementations in different languages. In this tutorial, we will look at an example of how to use the Zarr format by looking at some features of the `zarr-python` library and how Zarr files can be opened with `xarray`.\n", + "Zarr is a data format with implementations in different languages. In this tutorial, we will look at an example of how to use the Zarr format by looking at some features of the `zarr-python` library and how Zarr files can be opened with `xarray`.\n", "\n", "## What is Zarr?\n", "\n", @@ -25,17 +25,16 @@ "### Zarr Data Organization:\n", "- **Arrays**: N-dimensional arrays that can be chunked and compressed.\n", "- **Groups**: A container for organizing multiple arrays and other groups with a hierarchical structure.\n", - "- **Metadata**: JSON-like metadata describing the arrays and groups, including information about dimensions, data types, groups, and compression.\n", + "- **Metadata**: JSON-like metadata describing the arrays and groups, including information about data types, dimensions, chunking, compression, and user-defined key-value fields. \n", "- **Dimensions and Shape**: Arrays can have any number of dimensions, and their shape is defined by the number of elements in each dimension.\n", "- **Coordinates & Indexing**: Zarr supports coordinate arrays for each dimension, allowing for efficient indexing and slicing.\n", "\n", - "The diagram below from [the NASA Earthdata wiki](https://wiki.earthdata.nasa.gov/display/ESO/Zarr+Format) showing the structure of a Zarr store:\n", + "The diagram below from [the Zarr v3 specification](https://wiki.earthdata.nasa.gov/display/ESO/Zarr+Format) showing the structure of a Zarr store:\n", "\n", - "![EarthData](https://learning.nceas.ucsb.edu/2025-04-arctic/images/zarr-chunks.png)\n", + "![ZarrSpec](https://zarr-specs.readthedocs.io/en/latest/_images/terminology-hierarchy.excalidraw.png)\n", "\n", "\n", - "NetCDF and Zarr share similar terminology and functionality, but the key difference is that NetCDF is a single file, while Zarr is a directory-based “store” composed of many chunked files—making it better suited for distributed and cloud-based workflows.\n", - "\n" + "NetCDF and Zarr share similar terminology and functionality, but the key difference is that NetCDF is a single file, while Zarr is a directory-based “store” composed of many chunked files, making it better suited for distributed and cloud-based workflows." ] }, { @@ -69,7 +68,7 @@ "source": [ "import zarr\n", "\n", - "z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr', mode='w')\n", + "z = zarr.create_array(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr')\n", "z" ] }, @@ -228,11 +227,13 @@ "metadata": {}, "outputs": [], "source": [ - "root = zarr.group()\n", + "store = zarr.storage.MemoryStore()\n", + "root = zarr.create_group(store=store)\n", "temp = root.create_group('temp')\n", "precip = root.create_group('precip')\n", "t2m = temp.create_array('t2m', shape=(100, 100), chunks=(10, 10), dtype='i4')\n", - "prcp = precip.create_array('prcp', shape=(1000, 1000), chunks=(10, 10), dtype='i4')" + "prcp = precip.create_array('prcp', shape=(1000, 1000), chunks=(10, 10), dtype='i4')\n", + "root.tree()" ] }, { @@ -251,7 +252,7 @@ "metadata": {}, "outputs": [], "source": [ - "root['temp']\n", + "display(root['temp'])\n", "root['temp/t2m'][:, 3]" ] }, @@ -281,7 +282,7 @@ "metadata": {}, "outputs": [], "source": [ - "root.tree(expand=True)" + "root.tree()" ] }, { @@ -290,7 +291,7 @@ "metadata": {}, "source": [ "#### Chunking\n", - "Chunking is the process of dividing the data arrays into smaller pieces. This allows for parallel processing and efficient storage.\n", + "Chunking is the process of dividing Zarr arrays into smaller pieces. This allows for parallel processing and efficient storage.\n", "\n", "One of the important parameters in Zarr is the chunk shape, which determines how the data is divided into smaller, manageable pieces. This is crucial for performance, especially when working with large datasets.\n", "\n", @@ -329,7 +330,7 @@ "metadata": {}, "outputs": [], "source": [ - "c = zarr.create(shape=(200, 200, 200), chunks=(1, 200, 200), dtype='f8', store='c.zarr')\n", + "c = zarr.create_array(shape=(200, 200, 200), chunks=(1, 200, 200), dtype='f8', store='c.zarr')\n", "c[:] = np.random.randn(*c.shape)" ] }, @@ -350,7 +351,7 @@ "metadata": {}, "outputs": [], "source": [ - "d = zarr.create(shape=(200, 200, 200), chunks=(200, 200, 1), dtype='f8', store='d.zarr')\n", + "d = zarr.create_array(shape=(200, 200, 200), chunks=(200, 200, 1), dtype='f8', store='d.zarr')\n", "d[:] = np.random.randn(*d.shape)" ] }, @@ -377,8 +378,12 @@ "- File systems struggle with too many small files.\n", "- Small files (e.g., 1 MB or less) may waste space due to filesystem block size.\n", "- Object storage systems (e.g., S3) can slow down with a high number of objects.\n", + "\n", "With sharding, you choose:\n", - "\n" + "- Shard size: the logical shape of each shard, which is expected to include one or more chunks\n", + "- Chunk size: the shape of each compressed chunk\n", + "\n", + "It is important to remember that the shard is the minimum unit of writing. This means that writers must be able to fit the entire shard (including all of the compressed chunks) in memory before writing a shard to a store.\n" ] }, { @@ -526,13 +531,27 @@ "metadata": {}, "outputs": [], "source": [ + "from pprint import pprint\n", + "\n", "consolidated = zarr.open_group(store=store)\n", "consolidated_metadata = consolidated.metadata.consolidated_metadata.metadata\n", - "from pprint import pprint\n", "\n", "pprint(dict(sorted(consolidated_metadata.items())))" ] }, + { + "cell_type": "markdown", + "id": "a571acec-7a65-4a51-ad1e-c80b17494cd3", + "metadata": {}, + "source": [ + "Note that while Zarr-Python supports consolidated metadata for v2 and v3 formatted Zarr stores, it is not technically part of the specification (hence the warning above). \n", + "\n", + "⚠️ Use Caution When ⚠️\n", + "- **Stale or incomplete consolidated metadata**: If the dataset is updated but the consolidated metadata entrypoint isn't re-consolidated, readers may miss chunks or metadata. Always run zarr.consolidate_metadata() after changes.\n", + "- **Concurrent writes or multi-writer pipelines**: Consolidated metadata can lead to inconsistent reads if multiple processes write without coordination. Use with caution in dynamic or shared write environments.\n", + "- **Local filesystems or mixed toolchains**: On local storage, consolidation offers little benefit as hierarchy discovery is generally quite cheap. " + ] + }, { "cell_type": "markdown", "id": "46", @@ -575,6 +594,8 @@ "metadata": {}, "outputs": [], "source": [ + "import xarray as xr\n", + "\n", "store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr'\n", "\n", "ds = xr.open_dataset(store, engine='zarr', chunks={}, consolidated=True)\n", @@ -599,14 +620,13 @@ "::::{admonition} Exercise\n", ":class: tip\n", "\n", - "Can you calculate the mean precipitation over the time dimension in the GPCP dataset and plot it?\n", + "Can you calculate the mean precipitation for January 2020 in the GPCP dataset and plot it?\n", "\n", ":::{admonition} Solution\n", ":class: dropdown\n", "\n", "```python\n", - "ds.precip.mean(dim='time').plot()\n", - "\n", + "ds.precip.sel(time=slice('2020-01-01', '2020-01-31')).mean(dim='time').plot()\n", "```\n", ":::\n", "::::" @@ -628,13 +648,20 @@ ] }, { - "cell_type": "markdown", - "id": "53", + "cell_type": "code", + "execution_count": null, + "id": "09c50842-b522-4f3f-b04a-da22f9131b86", "metadata": {}, + "outputs": [], "source": [] } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -644,7 +671,8 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" + "pygments_lexer": "ipython3", + "version": "3.12.11" } }, "nbformat": 4, From b8edee684c7d7eb36094c8ba162072f6600925ab Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sun, 6 Jul 2025 23:16:56 +0000 Subject: [PATCH 15/20] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- intermediate/intro-to-zarr.ipynb | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 43a1c0f1..18aba959 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -541,7 +541,7 @@ }, { "cell_type": "markdown", - "id": "a571acec-7a65-4a51-ad1e-c80b17494cd3", + "id": "46", "metadata": {}, "source": [ "Note that while Zarr-Python supports consolidated metadata for v2 and v3 formatted Zarr stores, it is not technically part of the specification (hence the warning above). \n", @@ -554,7 +554,7 @@ }, { "cell_type": "markdown", - "id": "46", + "id": "47", "metadata": {}, "source": [ "### Object Storage as a Zarr Store\n", @@ -565,7 +565,7 @@ }, { "cell_type": "markdown", - "id": "47", + "id": "48", "metadata": {}, "source": [ "Here are some examples of Zarr stores on the cloud:\n", @@ -578,7 +578,7 @@ }, { "cell_type": "markdown", - "id": "48", + "id": "49", "metadata": {}, "source": [ "### Xarray and Zarr\n", @@ -590,7 +590,7 @@ { "cell_type": "code", "execution_count": null, - "id": "49", + "id": "50", "metadata": {}, "outputs": [], "source": [ @@ -605,7 +605,7 @@ { "cell_type": "code", "execution_count": null, - "id": "50", + "id": "51", "metadata": {}, "outputs": [], "source": [ @@ -614,7 +614,7 @@ }, { "cell_type": "markdown", - "id": "51", + "id": "52", "metadata": {}, "source": [ "::::{admonition} Exercise\n", @@ -634,7 +634,7 @@ }, { "cell_type": "markdown", - "id": "52", + "id": "53", "metadata": {}, "source": [ "In the next exercise, you will use the Xarray + Zarr to open CMIP6 dataset.\n", @@ -650,18 +650,13 @@ { "cell_type": "code", "execution_count": null, - "id": "09c50842-b522-4f3f-b04a-da22f9131b86", + "id": "54", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -671,8 +666,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.11" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From 3bf111d3dbe53f5e07707783db264f68a8ad2436 Mon Sep 17 00:00:00 2001 From: Scott Henderson <3924836+scottyhq@users.noreply.github.com> Date: Mon, 7 Jul 2025 11:05:54 +0200 Subject: [PATCH 16/20] ensure zarr notebook is rendered, add cell metadata for large outputs --- _toc.yml | 1 + intermediate/intro-to-zarr.ipynb | 50 ++++++++++++++++++++------------ 2 files changed, 32 insertions(+), 19 deletions(-) diff --git a/_toc.yml b/_toc.yml index b8cfb204..fbf8c3d2 100644 --- a/_toc.yml +++ b/_toc.yml @@ -44,6 +44,7 @@ parts: - file: intermediate/indexing/boolean-masking-indexing.ipynb - file: intermediate/hierarchical_computation.ipynb - file: intermediate/xarray_and_dask + - file: intermediate/intro-to-zarr.ipynb - file: intermediate/xarray_ecosystem - file: intermediate/hvplot - file: intermediate/remote_data/index diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 18aba959..e688ff7c 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -67,8 +67,16 @@ "outputs": [], "source": [ "import zarr\n", + "import pathlib\n", + "import shutil\n", "\n", - "z = zarr.create_array(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr')\n", + "# Ensure we start with a clean directory for the tutorial\n", + "datadir = pathlib.Path('../data/zarr-tutorial')\n", + "if datadir.exists():\n", + " shutil.rmtree(datadir)\n", + "\n", + "output = datadir / 'test.zarr'\n", + "z = zarr.create_array(shape=(40, 50), chunks=(10, 10), dtype='f8', store=output)\n", "z" ] }, @@ -188,20 +196,28 @@ "cell_type": "code", "execution_count": null, "id": "14", - "metadata": {}, + "metadata": { + "tags": [ + "scroll-output" + ] + }, "outputs": [], "source": [ - "!tree -a test.zarr" + "!tree -a ../data/test.zarr" ] }, { "cell_type": "code", "execution_count": null, "id": "15", - "metadata": {}, + "metadata": { + "tags": [ + "scroll-output" + ] + }, "outputs": [], "source": [ - "!cat test.zarr/zarr.json" + "!cat ../data/test.zarr/zarr.json" ] }, { @@ -330,7 +346,8 @@ "metadata": {}, "outputs": [], "source": [ - "c = zarr.create_array(shape=(200, 200, 200), chunks=(1, 200, 200), dtype='f8', store='c.zarr')\n", + "output = datadir / 'c.zarr'\n", + "c = zarr.create_array(shape=(200, 200, 200), chunks=(1, 200, 200), dtype='f8', store=output)\n", "c[:] = np.random.randn(*c.shape)" ] }, @@ -351,7 +368,8 @@ "metadata": {}, "outputs": [], "source": [ - "d = zarr.create_array(shape=(200, 200, 200), chunks=(200, 200, 1), dtype='f8', store='d.zarr')\n", + "output = datadir / 'd.zarr'\n", + "d = zarr.create_array(shape=(200, 200, 200), chunks=(200, 200, 1), dtype='f8', store=output)\n", "d[:] = np.random.randn(*d.shape)" ] }, @@ -401,8 +419,6 @@ "metadata": {}, "outputs": [], "source": [ - "import zarr\n", - "\n", "z6 = zarr.create_array(\n", " store={},\n", " shape=(10000, 10000, 1000),\n", @@ -485,7 +501,7 @@ "id": "41", "metadata": {}, "source": [ - "```{info}\n", + "```{note}\n", "`.info_complete()` provides a more detailed view of the Zarr array, including metadata about the chunks, compressors, and attributes, but will be slower for larger arrays. \n", "```" ] @@ -528,7 +544,11 @@ "cell_type": "code", "execution_count": null, "id": "45", - "metadata": {}, + "metadata": { + "tags": [ + "hide-output" + ] + }, "outputs": [], "source": [ "from pprint import pprint\n", @@ -646,14 +666,6 @@ "- [Scalable and Computationally Reproducible Approaches to Arctic Research](https://learning.nceas.ucsb.edu/2025-04-arctic/sections/zarr.html)\n", "- [Zarr Cloud Native Geospatial Tutorial](https://github.com/zarr-developers/tutorials/blob/main/zarr_cloud_native_geospatial_2022.ipynb)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "54", - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { From e33159bdfd55c0008e3eb7e80348aff5c765fe35 Mon Sep 17 00:00:00 2001 From: Scott Henderson <3924836+scottyhq@users.noreply.github.com> Date: Mon, 7 Jul 2025 11:24:53 +0200 Subject: [PATCH 17/20] remove accidentaly advaced-indexing edit --- intermediate/indexing/advanced-indexing.ipynb | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/intermediate/indexing/advanced-indexing.ipynb b/intermediate/indexing/advanced-indexing.ipynb index 27de401e..9870db21 100644 --- a/intermediate/indexing/advanced-indexing.ipynb +++ b/intermediate/indexing/advanced-indexing.ipynb @@ -197,18 +197,6 @@ "da_air.sel(lat=target_lat, lon=target_lon, method=\"nearest\") # -- orthogonal indexing" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "target_lat = xr.DataArray([31, 41, 42, 42], dims=\"degrees_north\")\n", - "target_lon = xr.DataArray([200, 201, 202, 205], dims=\"degrees_east\")\n", - "\n", - "da.sel(lat=target_lat, lon=target_lon, method=\"nearest\") # -- orthogonal indexing" - ] - }, { "cell_type": "markdown", "metadata": {}, From c411993d4d97748fd915b7e1f0b1a7c4e4ca6649 Mon Sep 17 00:00:00 2001 From: Scott Henderson <3924836+scottyhq@users.noreply.github.com> Date: Mon, 7 Jul 2025 11:25:39 +0200 Subject: [PATCH 18/20] use data subfolder for io notebook as in zarr notebook --- fundamentals/01.1_io.ipynb | 75 ++++++++++++++++++++++---------------- 1 file changed, 44 insertions(+), 31 deletions(-) diff --git a/fundamentals/01.1_io.ipynb b/fundamentals/01.1_io.ipynb index f5328c4c..317c961c 100644 --- a/fundamentals/01.1_io.ipynb +++ b/fundamentals/01.1_io.ipynb @@ -51,9 +51,27 @@ ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "id": "2", "metadata": {}, + "outputs": [], + "source": [ + "# Ensure we start with a clean directory for the tutorial\n", + "import pathlib\n", + "import shutil\n", + "\n", + "datadir = pathlib.Path('../data/io-tutorial')\n", + "if datadir.exists():\n", + " shutil.rmtree(datadir)\n", + "else:\n", + " datadir.mkdir()" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, "source": [ "The constructor of `Dataset` takes three parameters:\n", "\n", @@ -66,7 +84,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3", + "id": "4", "metadata": {}, "outputs": [], "source": [ @@ -94,16 +112,16 @@ ")\n", "\n", "# write datasets\n", - "ds1.to_netcdf(\"ds1.nc\")\n", - "ds2.to_netcdf(\"ds2.nc\")\n", + "ds1.to_netcdf(datadir / \"ds1.nc\")\n", + "ds2.to_netcdf(datadir / \"ds2.nc\")\n", "\n", "# write dataarray\n", - "ds1.a.to_netcdf(\"da1.nc\")" + "ds1.a.to_netcdf(datadir / \"da1.nc\")" ] }, { "cell_type": "markdown", - "id": "4", + "id": "5", "metadata": {}, "source": [ "Reading those files is just as simple:\n" @@ -112,26 +130,26 @@ { "cell_type": "code", "execution_count": null, - "id": "5", + "id": "6", "metadata": {}, "outputs": [], "source": [ - "xr.open_dataset(\"ds1.nc\")" + "xr.open_dataset(datadir / \"ds1.nc\")" ] }, { "cell_type": "code", "execution_count": null, - "id": "6", + "id": "7", "metadata": {}, "outputs": [], "source": [ - "xr.open_dataarray(\"da1.nc\")" + "xr.open_dataarray(datadir / \"da1.nc\")" ] }, { "cell_type": "markdown", - "id": "7", + "id": "8", "metadata": {}, "source": [ "\n", @@ -151,16 +169,16 @@ { "cell_type": "code", "execution_count": null, - "id": "8", + "id": "9", "metadata": {}, "outputs": [], "source": [ - "ds1.to_zarr(\"ds1.zarr\", mode=\"w\")" + "ds1.to_zarr(datadir / \"ds1.zarr\", mode=\"w\")" ] }, { "cell_type": "markdown", - "id": "9", + "id": "10", "metadata": {}, "source": [ "We can then read the created file with:\n" @@ -169,16 +187,16 @@ { "cell_type": "code", "execution_count": null, - "id": "10", + "id": "11", "metadata": {}, "outputs": [], "source": [ - "xr.open_zarr(\"ds1.zarr\", chunks=None)" + "xr.open_zarr(datadir / \"ds1.zarr\", chunks=None)" ] }, { "cell_type": "markdown", - "id": "11", + "id": "12", "metadata": {}, "source": [ "setting the `chunks` parameter to `None` avoids `dask` (more on that in a later\n", @@ -187,7 +205,7 @@ }, { "cell_type": "markdown", - "id": "12", + "id": "13", "metadata": {}, "source": [ "**tip:** You can write to any dictionary-like (`MutableMapping`) interface:" @@ -196,7 +214,7 @@ { "cell_type": "code", "execution_count": null, - "id": "13", + "id": "14", "metadata": {}, "outputs": [], "source": [ @@ -207,7 +225,7 @@ }, { "cell_type": "markdown", - "id": "14", + "id": "15", "metadata": {}, "source": [ "## Raster files using rioxarray\n", @@ -220,7 +238,7 @@ { "cell_type": "code", "execution_count": null, - "id": "15", + "id": "16", "metadata": {}, "outputs": [], "source": [ @@ -241,16 +259,16 @@ { "cell_type": "code", "execution_count": null, - "id": "16", + "id": "17", "metadata": {}, "outputs": [], "source": [ - "da.rio.to_raster('ds1_a.tiff')" + "da.rio.to_raster(datadir / 'ds1_a.tiff')" ] }, { "cell_type": "markdown", - "id": "17", + "id": "18", "metadata": {}, "source": [ "NOTE: you can now load this file into GIS tools like [QGIS](https://www.qgis.org)! Or open back into Xarray:" @@ -259,11 +277,11 @@ { "cell_type": "code", "execution_count": null, - "id": "18", + "id": "19", "metadata": {}, "outputs": [], "source": [ - "DA = xr.open_dataarray('ds1_a.tiff', engine='rasterio')\n", + "DA = xr.open_dataarray(datadir / 'ds1_a.tiff', engine='rasterio')\n", "DA.rio.crs" ] } @@ -279,11 +297,6 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" - }, - "vscode": { - "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" - } } }, "nbformat": 4, From 78acafadb2e09c42757c20844d41e361e4137be8 Mon Sep 17 00:00:00 2001 From: Scott Henderson <3924836+scottyhq@users.noreply.github.com> Date: Mon, 7 Jul 2025 11:33:08 +0200 Subject: [PATCH 19/20] fix cat and tree paths, toggle output --- intermediate/intro-to-zarr.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index e688ff7c..33bdf540 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -198,12 +198,12 @@ "id": "14", "metadata": { "tags": [ - "scroll-output" + "hide-output" ] }, "outputs": [], "source": [ - "!tree -a ../data/test.zarr" + "!tree -a {output}" ] }, { @@ -212,12 +212,12 @@ "id": "15", "metadata": { "tags": [ - "scroll-output" + "hide-output" ] }, "outputs": [], "source": [ - "!cat ../data/test.zarr/zarr.json" + "!cat {output}/zarr.json" ] }, { From fe50cbf045d97e3ead3f214147b66c005bbab437 Mon Sep 17 00:00:00 2001 From: Scott Henderson <3924836+scottyhq@users.noreply.github.com> Date: Mon, 7 Jul 2025 11:48:02 +0200 Subject: [PATCH 20/20] link to cmip6 notebook at end --- intermediate/intro-to-zarr.ipynb | 2 +- intermediate/remote_data/cmip6-cloud.ipynb | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/intermediate/intro-to-zarr.ipynb b/intermediate/intro-to-zarr.ipynb index 33bdf540..e991cf4f 100644 --- a/intermediate/intro-to-zarr.ipynb +++ b/intermediate/intro-to-zarr.ipynb @@ -657,7 +657,7 @@ "id": "53", "metadata": {}, "source": [ - "In the next exercise, you will use the Xarray + Zarr to open CMIP6 dataset.\n", + "Check out our other [tutorial notebook]() that highlights the CMIP6 Zarr dataset stored in the Cloud\n", "\n", "## Additional Resources\n", "\n", diff --git a/intermediate/remote_data/cmip6-cloud.ipynb b/intermediate/remote_data/cmip6-cloud.ipynb index e93eabf6..fa27ff7f 100644 --- a/intermediate/remote_data/cmip6-cloud.ipynb +++ b/intermediate/remote_data/cmip6-cloud.ipynb @@ -5,6 +5,7 @@ "id": "0", "metadata": {}, "source": [ + "(cmip6-cloud)=\n", "# Zarr in Cloud Object Storage\n", "\n", "In this tutorial, we'll cover the following:\n",