3
3
Serialization and IO
4
4
====================
5
5
6
- xarray supports direct serialization and IO to several file formats. For more
7
- options, consider exporting your objects to pandas (see the preceding section)
8
- and using its broad range of `IO tools `__.
9
-
10
- __ http://pandas.pydata.org/pandas-docs/stable/io.html
6
+ xarray supports direct serialization and IO to several file formats, from
7
+ simple :ref: `io.pickle ` files to the more flexible :ref: `io.netcdf `
8
+ format.
11
9
12
10
.. ipython :: python
13
11
:suppress:
@@ -17,6 +15,8 @@ __ http://pandas.pydata.org/pandas-docs/stable/io.html
17
15
import xarray as xr
18
16
np.random.seed(123456 )
19
17
18
+ .. _io.pickle :
19
+
20
20
Pickle
21
21
------
22
22
@@ -56,8 +56,6 @@ and lets you use xarray objects with Python modules like
56
56
Dictionary
57
57
----------
58
58
59
- Serializing an xarray object to a Python dictionary is also simple.
60
-
61
59
We can convert a ``Dataset `` (or a ``DataArray ``) to a dict using
62
60
:py:meth: `~xarray.Dataset.to_dict `:
63
61
@@ -79,28 +77,38 @@ require external libraries and dicts can easily be pickled, or converted to
79
77
json, or geojson. All the values are converted to lists, so dicts might
80
78
be quite large.
81
79
80
+ .. _io.netcdf :
81
+
82
82
netCDF
83
83
------
84
84
85
- Currently, the only disk based serialization format that xarray directly supports
86
- is `netCDF `__. netCDF is a file format for fully self-described datasets that
87
- is widely used in the geosciences and supported on almost all platforms. We use
88
- netCDF because xarray was based on the netCDF data model, so netCDF files on disk
89
- directly correspond to :py:class: `~xarray.Dataset ` objects. Recent versions of
85
+ The recommended way to store xarray data structures is `netCDF `__, which
86
+ is a binary file format for self-described datasets that originated
87
+ in the geosciences. xarray is based on the netCDF data model, so netCDF files
88
+ on disk directly correspond to :py:class: `~xarray.Dataset ` objects.
89
+
90
+ NetCDF is supported on almost all platforms, and parsers exist
91
+ for the vast majority of scientific programming languages. Recent versions of
90
92
netCDF are based on the even more widely used HDF5 file-format.
91
93
92
94
__ http://www.unidata.ucar.edu/software/netcdf/
93
95
94
- Reading and writing netCDF files with xarray requires the
95
- `netCDF4-Python `__ library or scipy to be installed.
96
+ .. tip ::
97
+
98
+ If you aren't familiar with this data format, the `netCDF FAQ `_ is a good
99
+ place to start.
100
+
101
+ .. _netCDF FAQ : http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF
102
+
103
+ Reading and writing netCDF files with xarray requires scipy or the
104
+ `netCDF4-Python `__ library to be installed (the later is required to
105
+ read/write netCDF V4 files and use the compression options described below).
96
106
97
107
__ https://github.com/Unidata/netcdf4-python
98
108
99
109
We can save a Dataset to disk using the
100
110
:py:attr: `Dataset.to_netcdf <xarray.Dataset.to_netcdf> ` method:
101
111
102
- .. use verbatim because readthedocs doesn't have netCDF4 support
103
-
104
112
.. ipython :: python
105
113
106
114
ds.to_netcdf(' saved_on_disk.nc' )
@@ -146,8 +154,8 @@ is modified: the original file on disk is never touched.
146
154
147
155
xarray's lazy loading of remote or on-disk datasets is often but not always
148
156
desirable. Before performing computationally intense operations, it is
149
- often a good idea to load a dataset entirely into memory by invoking the
150
- :py:meth: `~xarray.Dataset.load ` method.
157
+ often a good idea to load a Dataset (or DataArray) entirely into memory by
158
+ invoking the :py:meth: `~xarray.Dataset.load ` method.
151
159
152
160
Datasets have a :py:meth: `~xarray.Dataset.close ` method to close the associated
153
161
netCDF file. However, it's often cleaner to use a ``with `` statement:
@@ -393,6 +401,16 @@ We recommend installing PyNIO via conda::
393
401
394
402
.. _combining multiple files :
395
403
404
+
405
+ Formats supported by Pandas
406
+ ---------------------------
407
+
408
+ For more options (tabular formats and CSV files in particular), consider
409
+ exporting your objects to pandas and using its broad range of `IO tools `_.
410
+
411
+ .. _IO tools : http://pandas.pydata.org/pandas-docs/stable/io.html
412
+
413
+
396
414
Combining multiple files
397
415
------------------------
398
416
@@ -402,16 +420,19 @@ files into a single Dataset by making use of :py:func:`~xarray.concat`.
402
420
403
421
.. note ::
404
422
405
- Version 0.5 includes experimental support for manipulating datasets that
423
+ Version 0.5 includes support for manipulating datasets that
406
424
don't fit into memory with dask _. If you have dask installed, you can open
407
425
multiple files simultaneously using :py:func: `~xarray.open_mfdataset `::
408
426
409
427
xr.open_mfdataset('my/files/*.nc')
410
428
411
- This function automatically concatenates and merges into a single xarray datasets.
412
- For more details, see :ref: `dask.io `.
429
+ This function automatically concatenates and merges multiple files into a
430
+ single xarray dataset.
431
+ It is the recommended way to open multiple files with xarray.
432
+ For more details, see :ref: `dask.io ` and a `blog post `_ by Stephan Hoyer.
413
433
414
434
.. _dask : http://dask.pydata.org
435
+ .. _blog post : http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/
415
436
416
437
For example, here's how we could approximate ``MFDataset `` from the netCDF4
417
438
library::
0 commit comments