Skip to content

Doc updates #412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 31, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ matrix:
- python: 2.7
# nb. we have to remove scipy because conda install pandas brings it in:
# https://github.com/ContinuumIO/anaconda-issues/issues/145
env: UPDATE_ENV="conda remove scipy netCDF4 && pip install dask"
env: UPDATE_ENV="conda remove scipy netCDF4 && conda install dask"
- python: 3.3
env: UPDATE_ENV="conda remove netCDF4"
- python: 3.4
env: UPDATE_ENV="conda install -c pandas bottleneck h5py cython && pip install cyordereddict h5netcdf dask"
env: UPDATE_ENV="conda install -c pandas bottleneck h5py cython dask && pip install cyordereddict h5netcdf"
# don't require pydap tests to pass because the dap test server is unreliable
- python: 2.7
env: UPDATE_ENV="pip install pydap"
Expand Down
80 changes: 13 additions & 67 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -257,12 +257,6 @@ instead of tuples:

xray.Dataset({'bar': foo})

Or directly convert a data array into a dataset:

.. ipython:: python

foo.to_dataset(name='bar')

You can also create an dataset from a :py:class:`pandas.DataFrame` with
:py:meth:`Dataset.from_dataframe <xray.Dataset.from_dataframe>` or from a
netCDF file on disk with :py:func:`~xray.open_dataset`. See
Expand Down Expand Up @@ -340,16 +334,14 @@ a ``Dataset`` variable using ``__setitem__`` or ``update`` will
:ref:`automatically align<update>` the array(s) to the original
dataset's indexes.

Creating modified datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~
Modifying datasets
~~~~~~~~~~~~~~~~~~

You can copy a ``Dataset`` by using the :py:meth:`~xray.Dataset.copy` method:

.. ipython:: python

ds2 = ds.copy()
del ds2['time']
ds2
ds.copy()

By default, the copy is shallow, so only the container will be copied: the
arrays in the ``Dataset`` will still be stored in the same underlying
Expand All @@ -366,7 +358,7 @@ operations keep around coordinates:
ds[['temperature']]
ds[['x']]

If a dimension name is given as an argument to `drop`, it also drops all
If a dimension name is given as an argument to ``drop``, it also drops all
variables that use that dimension:

.. ipython:: python
Expand All @@ -379,6 +371,13 @@ Another useful option is the ability to rename the variables in a dataset:

ds.rename({'temperature': 'temp', 'precipitation': 'precip'})

Finally, you can use :py:meth:`~xray.Dataset.swap_dims` to swap dimension and non-dimension variables:

.. ipython:: python

ds.coords['day'] = ('time', [6, 7, 8])
ds.swap_dims({'time': 'day'})

.. _coordinates:

Coordinates
Expand Down Expand Up @@ -406,8 +405,8 @@ associated with coordinates. Coordinates with names not matching a dimension
are not used for alignment or indexing, nor are they required to match when
doing arithmetic (see :ref:`coordinates math`).

Converting to ``pandas.Index``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Indexes
~~~~~~~

To convert a coordinate (or any ``DataArray``) into an actual
:py:class:`pandas.Index`, use the :py:meth:`~xray.DataArray.to_index` method:
Expand All @@ -423,56 +422,3 @@ dimension and whose the values are ``Index`` objects:
.. ipython:: python

ds.indexes

Switching between data and coordinate variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To entirely add or removing coordinate arrays, you can use dictionary like
syntax, as shown above. To convert back and forth between data and
coordinates, use the the :py:meth:`~xray.Dataset.set_coords` and
:py:meth:`~xray.Dataset.reset_coords` methods:

.. ipython:: python

ds.reset_coords()
ds.set_coords(['temperature', 'precipitation'])
ds['temperature'].reset_coords(drop=True)

Notice that these operations skip coordinates with names given by dimensions,
as used for indexing. This mostly because we are not entirely sure how to
design the interface around the fact that xray cannot store a coordinate and
variable with the name but different values in the same dictionary. But we do
recognize that supporting something like this would be useful.

Converting into datasets
~~~~~~~~~~~~~~~~~~~~~~~~

``Coordinates`` objects also have a few useful methods, mostly for converting
them into dataset objects:

.. ipython:: python

ds.coords.to_dataset()

The merge method is particularly interesting, because it implements the same
logic used for merging coordinates in arithmetic operations
(see :ref:`comput`):

.. ipython:: python

alt = xray.Dataset(coords={'z': [10], 'lat': 0, 'lon': 0})
ds.coords.merge(alt.coords)

The ``coords.merge`` method may be useful if you want to implement your own
binary operations that act on xray objects. In the future, we hope to write
more helper functions so that you can easily make your functions act like
xray's built-in arithmetic.


.. [1] Latitude and longitude are 2D arrays because the dataset uses
`projected coordinates`__. ``reference_time`` refers to the reference time
at which the forecast was made, rather than ``time`` which is the valid time
for which the forecast applies.

__ http://en.wikipedia.org/wiki/Map_projection

16 changes: 0 additions & 16 deletions doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,19 +132,3 @@ pandas) make it a faster and more flexible data analysis tool. That said, Iris
and CDAT have some great domain specific functionality, and we would love to
have support for converting their native objects to and from xray (see
:issue:`37` and :issue:`133`)


Does xray support out-of-core computation?
------------------------------------------

Not yet! Distributed and out-of-memory computation is certainly something we're
excited about, but for now we have focused on making xray a full-featured tool for
in-memory analytics (like pandas).

We have some ideas for what out-of-core support could look like (probably
through a library like biggus_ or Blaze_), but we're not there yet. An
intermediate step would be supporting incremental writes to a Dataset linked to
a NetCDF file on disk.

.. _biggus: https://github.com/SciTools/biggus
.. _Blaze: https://github.com/continuumio/blaze
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Documentation
computation
groupby
combining
reshaping
time-series
pandas
io
Expand Down
7 changes: 7 additions & 0 deletions doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ DataArray:
arr[0, 0]
arr[:, [2, 1]]

.. warning::

Positional indexing deviates from the NumPy when indexing with multiple
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`.

xray also supports label-based indexing, just like pandas. Because
we use a :py:class:`pandas.Index` under the hood, label based indexing is very
fast. To do label based indexing, use the :py:attr:`~xray.DataArray.loc` attribute:
Expand Down Expand Up @@ -165,6 +170,8 @@ index labels along a dimension dropped:

``drop`` is both a ``Dataset`` and ``DataArray`` method.

.. _indexing details:

Indexing details
----------------

Expand Down
123 changes: 123 additions & 0 deletions doc/reshaping.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
Reshaping and reorganizing data
===============================

.. ipython:: python
:suppress:

import numpy as np
import pandas as pd
import xray
np.random.seed(123456)
np.set_printoptions(threshold=10)

We'll return to our example dataset from :ref:`data structures`:

.. ipython:: python

temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]

# for real use cases, its good practice to supply array attributes such as
# units, but we won't bother here for the sake of brevity
ds = xray.Dataset({'temperature': (['x', 'y', 'time'], temp),
'precipitation': (['x', 'y', 'time'], precip)},
coords={'lon': (['x', 'y'], lon),
'lat': (['x', 'y'], lat),
'time': pd.date_range('2014-09-06', periods=3),
'reference_time': pd.Timestamp('2014-09-05')})

Converting between Dataset and DataArray
----------------------------------------

To convert from a Dataset to a DataArray, use :py:meth:`~xray.Dataset.to_array`:

.. ipython:: python

arr = ds.to_array()
arr

This method broadcasts all data variables in the dataset against each other,
then concatenates them along a new dimension into a new array while preserving
coordinates.

To convert back from a DataArray to a Dataset, use
:py:meth:`~xray.Dataset.to_dataset`:

.. ipython:: python

arr.to_dataset(dim='variable')

.. note::

The broadcasting behavior of ``to_array`` means that the resulting array
includes the union of data variable dimensions:

.. ipython:: python

ds2 = xray.Dataset({'a': 0, 'b': ('x', [3, 4, 5])})

# the input dataset has 4 elements
ds2

# the resulting array has 6 elements
ds2.to_array()

Otherwise, the result could not be represented as an orthogonal array.

If you use ``to_dataset`` without supplying the ``dim`` argument, the DataArray will be converted into a Dataset of one variable:

.. ipython:: python

arr.to_dataset(name='combined')

Coordinate variables
--------------------

To entirely add or removing coordinate arrays, you can use dictionary like
syntax, as shown in . To convert back and forth between data and
coordinates, use the the :py:meth:`~xray.Dataset.set_coords` and
:py:meth:`~xray.Dataset.reset_coords` methods:

.. ipython:: python

ds.reset_coords()
ds.set_coords(['temperature', 'precipitation'])
ds['temperature'].reset_coords(drop=True)

Notice that these operations skip coordinates with names given by dimensions,
as used for indexing. This mostly because we are not entirely sure how to
design the interface around the fact that xray cannot store a coordinate and
variable with the name but different values in the same dictionary. But we do
recognize that supporting something like this would be useful.

``Coordinates`` objects also have a few useful methods, mostly for converting
them into dataset objects:

.. ipython:: python

ds.coords.to_dataset()

The merge method is particularly interesting, because it implements the same
logic used for merging coordinates in arithmetic operations
(see :ref:`comput`):

.. ipython:: python

alt = xray.Dataset(coords={'z': [10], 'lat': 0, 'lon': 0})
ds.coords.merge(alt.coords)

The ``coords.merge`` method may be useful if you want to implement your own
binary operations that act on xray objects. In the future, we hope to write
more helper functions so that you can easily make your functions act like
xray's built-in arithmetic.


.. [1] Latitude and longitude are 2D arrays because the dataset uses
`projected coordinates`__. ``reference_time`` refers to the reference time
at which the forecast was made, rather than ``time`` which is the valid time
for which the forecast applies.

__ http://en.wikipedia.org/wiki/Map_projection

12 changes: 7 additions & 5 deletions xray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1742,17 +1742,19 @@ def ensure_common_dims(vars):

return concatenated

def to_array(self, dim='variable'):
def to_array(self, dim='variable', name=None):
"""Convert this dataset into an xray.DataArray

The data variables of this dataset will be stacked along the first
axis of the new array. All coordinates of this dataset will remain
coordinates.
The data variables of this dataset will be broadcast against each other
and stacked along the first axis of the new array. All coordinates of
this dataset will remain coordinates.

Parameters
----------
dim : str, optional
Name of the new dimension.
name : str, optional
Name of the new data array.

Returns
-------
Expand All @@ -1769,7 +1771,7 @@ def to_array(self, dim='variable'):

dims = (dim,) + broadcast_vars[0].dims

return DataArray(data, coords, dims, attrs=self.attrs)
return DataArray(data, coords, dims, attrs=self.attrs, name=name)

def _to_dataframe(self, ordered_dims):
columns = [k for k in self if k not in self.dims]
Expand Down
4 changes: 2 additions & 2 deletions xray/test/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1550,8 +1550,8 @@ def test_to_array(self):
actual = ds.to_array()
self.assertDataArrayIdentical(expected, actual)

actual = ds.to_array('abc')
expected = expected.rename({'variable': 'abc'})
actual = ds.to_array('abc', name='foo')
expected = expected.rename({'variable': 'abc'}).rename('foo')
self.assertDataArrayIdentical(expected, actual)

def test_to_and_from_dataframe(self):
Expand Down