|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "0ebd8a4d-6937-4ad6-9c93-fa944fb389c1", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Accessing remote data stored on the cloud\n", |
| 9 | + "\n", |
| 10 | + "In this tutorial, we'll cover the following:\n", |
| 11 | + "- Finding a cloud hosted Zarr archive of CMIP6 dataset(s)\n", |
| 12 | + "- Remote data access to a single CMIP6 dataset (sea surface height)\n", |
| 13 | + "- Calculate future predicted sea level change in 2100 compared to 2015" |
| 14 | + ] |
| 15 | + }, |
| 16 | + { |
| 17 | + "cell_type": "code", |
| 18 | + "execution_count": null, |
| 19 | + "id": "b7533f0e-5dd1-423a-9a04-8ed755d180a2", |
| 20 | + "metadata": {}, |
| 21 | + "outputs": [], |
| 22 | + "source": [ |
| 23 | + "import gcsfs\n", |
| 24 | + "import pandas as pd\n", |
| 25 | + "import xarray as xr" |
| 26 | + ] |
| 27 | + }, |
| 28 | + { |
| 29 | + "cell_type": "markdown", |
| 30 | + "id": "95002377-b0a6-479f-928d-53b044b390df", |
| 31 | + "metadata": {}, |
| 32 | + "source": [ |
| 33 | + "## Finding cloud native data\n", |
| 34 | + "\n", |
| 35 | + "Cloud-native data means data that is structured for efficient querying across the network.\n", |
| 36 | + "Typically, this means having metadata that describes the entire file in the header of the\n", |
| 37 | + "file, or having a a separate pointer file (so that there is no need to download everything first).\n", |
| 38 | + "\n", |
| 39 | + "Quite commonly, you'll see cloud-native datasets stored on these\n", |
| 40 | + "three object storage providers, though there are many other ones too.\n", |
| 41 | + "\n", |
| 42 | + "- [Amazon Simple Storage Service (S3)](https://aws.amazon.com/s3)\n", |
| 43 | + "- [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs)\n", |
| 44 | + "- [Google Cloud Storage](https://cloud.google.com/storage)" |
| 45 | + ] |
| 46 | + }, |
| 47 | + { |
| 48 | + "cell_type": "markdown", |
| 49 | + "id": "bc520e32-204f-4f92-bdec-4f678160d6de", |
| 50 | + "metadata": {}, |
| 51 | + "source": [ |
| 52 | + "### Getting cloud hosted CMIP6 data\n", |
| 53 | + "\n", |
| 54 | + "The [Coupled Model Intercomparison Project Phase 6 (CMIP6)](https://en.wikipedia.org/wiki/CMIP6#CMIP_Phase_6)\n", |
| 55 | + "dataset is a rich archive of modelling experiments carried out to predict the climate change impacts.\n", |
| 56 | + "The datasets are stored using the [Zarr](https://zarr.dev) format, and we'll go over how to access it.\n", |
| 57 | + "\n", |
| 58 | + "Sources:\n", |
| 59 | + "- https://esgf-node.llnl.gov/search/cmip6/\n", |
| 60 | + "- CMIP6 data hosted on Google Cloud - https://console.cloud.google.com/marketplace/details/noaa-public/cmip6\n", |
| 61 | + "- Pangeo/ESGF Cloud Data Access tutorial - https://pangeo-data.github.io/pangeo-cmip6-cloud/accessing_data.html" |
| 62 | + ] |
| 63 | + }, |
| 64 | + { |
| 65 | + "cell_type": "markdown", |
| 66 | + "id": "8d12400d-ab5e-420e-b9f5-b61e083dc9ce", |
| 67 | + "metadata": {}, |
| 68 | + "source": [ |
| 69 | + "First, let's open a CSV containing the list of CMIP6 datasets available" |
| 70 | + ] |
| 71 | + }, |
| 72 | + { |
| 73 | + "cell_type": "code", |
| 74 | + "execution_count": null, |
| 75 | + "id": "d1d9f94c-dbe3-4151-8ee7-fa182724810b", |
| 76 | + "metadata": {}, |
| 77 | + "outputs": [], |
| 78 | + "source": [ |
| 79 | + "df = pd.read_csv(\"https://cmip6.storage.googleapis.com/pangeo-cmip6.csv\")\n", |
| 80 | + "print(f\"Number of rows: {len(df)}\")\n", |
| 81 | + "df.head()" |
| 82 | + ] |
| 83 | + }, |
| 84 | + { |
| 85 | + "cell_type": "markdown", |
| 86 | + "id": "eb263332-dc60-4bd1-9ef3-cf9612cf09a1", |
| 87 | + "metadata": {}, |
| 88 | + "source": [ |
| 89 | + "Over 5 million rows! Let's filter it down to the variable and experiment\n", |
| 90 | + "we're interested in, e.g. sea surface height.\n", |
| 91 | + "\n", |
| 92 | + "For the `variable_id`, you can look it up given some keyword at\n", |
| 93 | + "https://docs.google.com/spreadsheets/d/1UUtoz6Ofyjlpx5LdqhKcwHFz2SGoTQV2_yekHyMfL9Y\n", |
| 94 | + "\n", |
| 95 | + "For the `experiment_id`, download the spreadsheet from\n", |
| 96 | + "https://github.com/ES-DOC/esdoc-docs/blob/master/cmip6/experiments/spreadsheet/experiments.xlsx,\n", |
| 97 | + "go to the 'experiment' tab, and find the one you're interested in.\n", |
| 98 | + "\n", |
| 99 | + "Another good place to find the right model runs is https://esgf-node.llnl.gov/search/cmip6\n", |
| 100 | + "(once you get your head around the acronyms and short names)." |
| 101 | + ] |
| 102 | + }, |
| 103 | + { |
| 104 | + "cell_type": "markdown", |
| 105 | + "id": "9b435c14-fd56-481c-b5f4-781794a1cc1a", |
| 106 | + "metadata": {}, |
| 107 | + "source": [ |
| 108 | + "Below, we'll filter to CMIP6 experiments matching:\n", |
| 109 | + "- Sea Surface Height Above Geoid [m] (variable_id: `zos`)\n", |
| 110 | + "- Shared Socioeconomic Pathway 5 (experiment_id: `ssp585`)" |
| 111 | + ] |
| 112 | + }, |
| 113 | + { |
| 114 | + "cell_type": "code", |
| 115 | + "execution_count": null, |
| 116 | + "id": "2fe50e53-b02f-4a84-bc4a-e1934fe32661", |
| 117 | + "metadata": {}, |
| 118 | + "outputs": [], |
| 119 | + "source": [ |
| 120 | + "df_zos = df.query(\"variable_id == 'zos' & experiment_id == 'ssp585'\")\n", |
| 121 | + "df_zos" |
| 122 | + ] |
| 123 | + }, |
| 124 | + { |
| 125 | + "cell_type": "markdown", |
| 126 | + "id": "9ddfad3e-d4de-4c0a-be6f-53f1f7928f51", |
| 127 | + "metadata": {}, |
| 128 | + "source": [ |
| 129 | + "There's 272 modelled scenarios for SSP5.\n", |
| 130 | + "Let's just get the URL to the first one in the list for now." |
| 131 | + ] |
| 132 | + }, |
| 133 | + { |
| 134 | + "cell_type": "code", |
| 135 | + "execution_count": null, |
| 136 | + "id": "5515186d-8571-439a-b5a8-b8b56aab77f6", |
| 137 | + "metadata": {}, |
| 138 | + "outputs": [], |
| 139 | + "source": [ |
| 140 | + "print(df_zos.zstore.iloc[0])" |
| 141 | + ] |
| 142 | + }, |
| 143 | + { |
| 144 | + "cell_type": "markdown", |
| 145 | + "id": "b68bcfbb-24c9-420d-b297-44c678b7f8ce", |
| 146 | + "metadata": {}, |
| 147 | + "source": [ |
| 148 | + "## Reading from the remote Zarr storage" |
| 149 | + ] |
| 150 | + }, |
| 151 | + { |
| 152 | + "cell_type": "markdown", |
| 153 | + "id": "b3f5660d-bd46-44f6-8f6d-a62947b6f2c4", |
| 154 | + "metadata": {}, |
| 155 | + "source": [ |
| 156 | + "In many cases, you'll need to first connect to the cloud provider.\n", |
| 157 | + "The CMIP6 dataset allows anonymous access, but for some cases,\n", |
| 158 | + "you may need to authenticate." |
| 159 | + ] |
| 160 | + }, |
| 161 | + { |
| 162 | + "cell_type": "code", |
| 163 | + "execution_count": null, |
| 164 | + "id": "a4e6d5e3-35a0-4c31-a1b8-96258cf50974", |
| 165 | + "metadata": {}, |
| 166 | + "outputs": [], |
| 167 | + "source": [ |
| 168 | + "fs = gcsfs.GCSFileSystem(token=\"anon\")" |
| 169 | + ] |
| 170 | + }, |
| 171 | + { |
| 172 | + "cell_type": "markdown", |
| 173 | + "id": "b959f829-e434-4a84-82d2-2f2b24dc84d2", |
| 174 | + "metadata": {}, |
| 175 | + "source": [ |
| 176 | + "Next, we'll need a mapping to the Google Storage object.\n", |
| 177 | + "This can be done using `fs.get_mapper`.\n", |
| 178 | + "\n", |
| 179 | + "A more generic way (for other cloud providers) is to use\n", |
| 180 | + "[`fsspec.get_mapper`](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.get_mapper) instead." |
| 181 | + ] |
| 182 | + }, |
| 183 | + { |
| 184 | + "cell_type": "code", |
| 185 | + "execution_count": null, |
| 186 | + "id": "e1527d1f-503e-4b0b-8433-794067ed46cc", |
| 187 | + "metadata": {}, |
| 188 | + "outputs": [], |
| 189 | + "source": [ |
| 190 | + "store = fs.get_mapper(\n", |
| 191 | + " \"gs://cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ESM4/ssp585/r1i1p1f1/Omon/zos/gn/v20180701/\"\n", |
| 192 | + ")" |
| 193 | + ] |
| 194 | + }, |
| 195 | + { |
| 196 | + "cell_type": "markdown", |
| 197 | + "id": "b694baac-9259-4de8-8eae-ac3cb653d894", |
| 198 | + "metadata": {}, |
| 199 | + "source": [ |
| 200 | + "With that, we can open the Zarr store like so." |
| 201 | + ] |
| 202 | + }, |
| 203 | + { |
| 204 | + "cell_type": "code", |
| 205 | + "execution_count": null, |
| 206 | + "id": "74b6d289-a852-4216-a3b6-4483d5ff2854", |
| 207 | + "metadata": {}, |
| 208 | + "outputs": [], |
| 209 | + "source": [ |
| 210 | + "ds = xr.open_zarr(store=store, consolidated=True)\n", |
| 211 | + "ds" |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "markdown", |
| 216 | + "id": "d81a5958-517b-4215-8c02-b1083b4b4fe2", |
| 217 | + "metadata": {}, |
| 218 | + "source": [ |
| 219 | + "### Selecting time slices\n", |
| 220 | + "\n", |
| 221 | + "Let's say we want to calculate sea level change between\n", |
| 222 | + "2015 and 2100. We can access just the specific time points\n", |
| 223 | + "needed using [`xr.Dataset.sel`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.sel.html)." |
| 224 | + ] |
| 225 | + }, |
| 226 | + { |
| 227 | + "cell_type": "code", |
| 228 | + "execution_count": null, |
| 229 | + "id": "1101b455-ba65-4cab-a3b6-bf2601958400", |
| 230 | + "metadata": {}, |
| 231 | + "outputs": [], |
| 232 | + "source": [ |
| 233 | + "zos_2015jan = ds.zos.sel(time=\"2015-01-16\").squeeze()\n", |
| 234 | + "zos_2100dec = ds.zos.sel(time=\"2100-12-16\").squeeze()" |
| 235 | + ] |
| 236 | + }, |
| 237 | + { |
| 238 | + "cell_type": "markdown", |
| 239 | + "id": "fb8d90a2-9883-41da-b26c-7b5547a15270", |
| 240 | + "metadata": {}, |
| 241 | + "source": [ |
| 242 | + "Sea level change would just be 2100 minus 2015." |
| 243 | + ] |
| 244 | + }, |
| 245 | + { |
| 246 | + "cell_type": "code", |
| 247 | + "execution_count": null, |
| 248 | + "id": "1f5fa1ee-260c-4ec4-898a-230826f9f2c8", |
| 249 | + "metadata": {}, |
| 250 | + "outputs": [], |
| 251 | + "source": [ |
| 252 | + "sealevelchange = zos_2100dec - zos_2015jan" |
| 253 | + ] |
| 254 | + }, |
| 255 | + { |
| 256 | + "cell_type": "markdown", |
| 257 | + "id": "0e087f3b-0315-40db-ae03-a3393b49c30e", |
| 258 | + "metadata": {}, |
| 259 | + "source": [ |
| 260 | + "Note that up to this point, we have not actually downloaded any\n", |
| 261 | + "(big) data yet from the cloud. This is all working based on\n", |
| 262 | + "metadata only.\n", |
| 263 | + "\n", |
| 264 | + "To bring the data from the cloud to your local computer, call `.compute`.\n", |
| 265 | + "This will take a while depending on your connection speed." |
| 266 | + ] |
| 267 | + }, |
| 268 | + { |
| 269 | + "cell_type": "code", |
| 270 | + "execution_count": null, |
| 271 | + "id": "38c2152e-67e7-449e-8f1a-2d64f63dedda", |
| 272 | + "metadata": {}, |
| 273 | + "outputs": [], |
| 274 | + "source": [ |
| 275 | + "sealevelchange = sealevelchange.compute()" |
| 276 | + ] |
| 277 | + }, |
| 278 | + { |
| 279 | + "cell_type": "markdown", |
| 280 | + "id": "5226729f-07db-4fe6-a980-9a1f630c8277", |
| 281 | + "metadata": {}, |
| 282 | + "source": [ |
| 283 | + "We can do a quick plot to show how Sea Level is predicted to change\n", |
| 284 | + "between 2015-2100 (from one modelled experiment)." |
| 285 | + ] |
| 286 | + }, |
| 287 | + { |
| 288 | + "cell_type": "code", |
| 289 | + "execution_count": null, |
| 290 | + "id": "8c42ed9f-fc61-4762-9765-3dd553d5c2ad", |
| 291 | + "metadata": {}, |
| 292 | + "outputs": [], |
| 293 | + "source": [ |
| 294 | + "sealevelchange.plot.imshow()" |
| 295 | + ] |
| 296 | + }, |
| 297 | + { |
| 298 | + "cell_type": "markdown", |
| 299 | + "id": "b4361786-c889-4ae7-a704-dcbda50513da", |
| 300 | + "metadata": {}, |
| 301 | + "source": [ |
| 302 | + "Notice the blue parts between -40 and -60 South where sea level has dropped?\n", |
| 303 | + "That's to do with the Antarctic ice sheet losing mass and resulting in a lower\n", |
| 304 | + "gravitational pull, resulting in a relative decrease in sea level. Over most\n", |
| 305 | + "of the Northern Hemisphere though, sea level rise has increased between 2015 and 2100." |
| 306 | + ] |
| 307 | + }, |
| 308 | + { |
| 309 | + "cell_type": "markdown", |
| 310 | + "id": "a87aa0a3-c82e-4da0-a5d0-31e42039feae", |
| 311 | + "metadata": {}, |
| 312 | + "source": [ |
| 313 | + "That's all! Hopefully this will get you started on accessing more cloud-native datasets!" |
| 314 | + ] |
| 315 | + } |
| 316 | + ], |
| 317 | + "metadata": { |
| 318 | + "language_info": { |
| 319 | + "codemirror_mode": { |
| 320 | + "name": "ipython", |
| 321 | + "version": 3 |
| 322 | + }, |
| 323 | + "file_extension": ".py", |
| 324 | + "mimetype": "text/x-python", |
| 325 | + "name": "python", |
| 326 | + "nbconvert_exporter": "python", |
| 327 | + "pygments_lexer": "ipython3" |
| 328 | + } |
| 329 | + }, |
| 330 | + "nbformat": 4, |
| 331 | + "nbformat_minor": 5 |
| 332 | +} |
0 commit comments