Skip to content

Commit 88cc396

Browse files
authored
DOC: small improvements to the netCDF docs (#1169)
1 parent 8192190 commit 88cc396

File tree

1 file changed

+42
-21
lines changed

1 file changed

+42
-21
lines changed

doc/io.rst

Lines changed: 42 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,9 @@
33
Serialization and IO
44
====================
55

6-
xarray supports direct serialization and IO to several file formats. For more
7-
options, consider exporting your objects to pandas (see the preceding section)
8-
and using its broad range of `IO tools`__.
9-
10-
__ http://pandas.pydata.org/pandas-docs/stable/io.html
6+
xarray supports direct serialization and IO to several file formats, from
7+
simple :ref:`io.pickle` files to the more flexible :ref:`io.netcdf`
8+
format.
119

1210
.. ipython:: python
1311
:suppress:
@@ -17,6 +15,8 @@ __ http://pandas.pydata.org/pandas-docs/stable/io.html
1715
import xarray as xr
1816
np.random.seed(123456)
1917
18+
.. _io.pickle:
19+
2020
Pickle
2121
------
2222

@@ -56,8 +56,6 @@ and lets you use xarray objects with Python modules like
5656
Dictionary
5757
----------
5858

59-
Serializing an xarray object to a Python dictionary is also simple.
60-
6159
We can convert a ``Dataset`` (or a ``DataArray``) to a dict using
6260
:py:meth:`~xarray.Dataset.to_dict`:
6361

@@ -79,28 +77,38 @@ require external libraries and dicts can easily be pickled, or converted to
7977
json, or geojson. All the values are converted to lists, so dicts might
8078
be quite large.
8179

80+
.. _io.netcdf:
81+
8282
netCDF
8383
------
8484

85-
Currently, the only disk based serialization format that xarray directly supports
86-
is `netCDF`__. netCDF is a file format for fully self-described datasets that
87-
is widely used in the geosciences and supported on almost all platforms. We use
88-
netCDF because xarray was based on the netCDF data model, so netCDF files on disk
89-
directly correspond to :py:class:`~xarray.Dataset` objects. Recent versions of
85+
The recommended way to store xarray data structures is `netCDF`__, which
86+
is a binary file format for self-described datasets that originated
87+
in the geosciences. xarray is based on the netCDF data model, so netCDF files
88+
on disk directly correspond to :py:class:`~xarray.Dataset` objects.
89+
90+
NetCDF is supported on almost all platforms, and parsers exist
91+
for the vast majority of scientific programming languages. Recent versions of
9092
netCDF are based on the even more widely used HDF5 file-format.
9193

9294
__ http://www.unidata.ucar.edu/software/netcdf/
9395

94-
Reading and writing netCDF files with xarray requires the
95-
`netCDF4-Python`__ library or scipy to be installed.
96+
.. tip::
97+
98+
If you aren't familiar with this data format, the `netCDF FAQ`_ is a good
99+
place to start.
100+
101+
.. _netCDF FAQ: http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF
102+
103+
Reading and writing netCDF files with xarray requires scipy or the
104+
`netCDF4-Python`__ library to be installed (the later is required to
105+
read/write netCDF V4 files and use the compression options described below).
96106

97107
__ https://github.com/Unidata/netcdf4-python
98108

99109
We can save a Dataset to disk using the
100110
:py:attr:`Dataset.to_netcdf <xarray.Dataset.to_netcdf>` method:
101111

102-
.. use verbatim because readthedocs doesn't have netCDF4 support
103-
104112
.. ipython:: python
105113
106114
ds.to_netcdf('saved_on_disk.nc')
@@ -146,8 +154,8 @@ is modified: the original file on disk is never touched.
146154

147155
xarray's lazy loading of remote or on-disk datasets is often but not always
148156
desirable. Before performing computationally intense operations, it is
149-
often a good idea to load a dataset entirely into memory by invoking the
150-
:py:meth:`~xarray.Dataset.load` method.
157+
often a good idea to load a Dataset (or DataArray) entirely into memory by
158+
invoking the :py:meth:`~xarray.Dataset.load` method.
151159

152160
Datasets have a :py:meth:`~xarray.Dataset.close` method to close the associated
153161
netCDF file. However, it's often cleaner to use a ``with`` statement:
@@ -393,6 +401,16 @@ We recommend installing PyNIO via conda::
393401

394402
.. _combining multiple files:
395403

404+
405+
Formats supported by Pandas
406+
---------------------------
407+
408+
For more options (tabular formats and CSV files in particular), consider
409+
exporting your objects to pandas and using its broad range of `IO tools`_.
410+
411+
.. _IO tools: http://pandas.pydata.org/pandas-docs/stable/io.html
412+
413+
396414
Combining multiple files
397415
------------------------
398416

@@ -402,16 +420,19 @@ files into a single Dataset by making use of :py:func:`~xarray.concat`.
402420

403421
.. note::
404422

405-
Version 0.5 includes experimental support for manipulating datasets that
423+
Version 0.5 includes support for manipulating datasets that
406424
don't fit into memory with dask_. If you have dask installed, you can open
407425
multiple files simultaneously using :py:func:`~xarray.open_mfdataset`::
408426

409427
xr.open_mfdataset('my/files/*.nc')
410428

411-
This function automatically concatenates and merges into a single xarray datasets.
412-
For more details, see :ref:`dask.io`.
429+
This function automatically concatenates and merges multiple files into a
430+
single xarray dataset.
431+
It is the recommended way to open multiple files with xarray.
432+
For more details, see :ref:`dask.io` and a `blog post`_ by Stephan Hoyer.
413433

414434
.. _dask: http://dask.pydata.org
435+
.. _blog post: http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/
415436

416437
For example, here's how we could approximate ``MFDataset`` from the netCDF4
417438
library::

0 commit comments

Comments
 (0)