Skip to content

Edits to "Xarray and Dask" #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 15, 2023
33 changes: 14 additions & 19 deletions intermediate/xarray_and_dask.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -230,9 +230,6 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
Expand All @@ -246,16 +243,13 @@
"source": [
"### Exercise\n",
"\n",
"Try calling `mean.values` and `mean.data`. Do you understand the difference?"
"Try calling `ds.air.values` and `ds.air.data`. Do you understand the difference?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
Expand Down Expand Up @@ -331,7 +325,7 @@
"2. `.load()` replaces the dask array in the xarray object with a numpy array.\n",
" This is equivalent to `ds = ds.compute()`\n",
" \n",
"**Tip:** There is a third option : \"persisting\". `.persist()` loads the values into distributed RAM. The values are computed but remain distributed across workers. So `ds.air.persist()` still returns a dask array. This is useful if you will be repeatedly using a dataset for computation but it is too large to load into local memory. You will see a persistent task on the dashboard. See the [dask user guide](https://docs.dask.org/en/latest/api.html#dask.persist) for more on persisting\n"
"**Tip:** There is a third option : \"persisting\". `.persist()` loads the values into distributed RAM. The values are computed but remain distributed across workers. So `ds.air.persist()` still returns a dask array. This is useful if you will be repeatedly using a dataset for computation but it is too large to load into local memory. You will see a persistent task on the dashboard. See the [dask user guide](https://docs.dask.org/en/latest/api.html#dask.persist) for more on persisting"
]
},
{
Expand All @@ -347,9 +341,6 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
Expand Down Expand Up @@ -446,7 +437,10 @@
"\n",
"You can use any kind of Dask cluster. This step is completely independent of\n",
"xarray. While not strictly necessary, the dashboard provides a nice learning\n",
"tool."
"tool.\n",
"\n",
"By default, Dask uses the current working directory for writing temporary files.\n",
"We choose to use a temporary scratch folder `local_directory='/tmp'` in the example below instead."
]
},
{
Expand All @@ -464,10 +458,17 @@
"# if os.environ.get('JUPYTERHUB_USER'):\n",
"# dask.config.set(**{\"distributed.dashboard.link\": \"/user/{JUPYTERHUB_USER}/proxy/{port}/status\"})\n",
"\n",
"client = Client(local_directory='/tmp')\n",
"client = Client()\n",
"client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -483,9 +484,6 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
Expand Down Expand Up @@ -539,9 +537,6 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
Expand Down