Skip to content

Stephan's sprintbattical #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Feb 21, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
920fd42
Tweaks to Variable and Dataset repr
shoyer Jan 31, 2014
e330438
Add DataView and indexing
shoyer Feb 2, 2014
63e2b15
Orthogonal indexing and cached indices
shoyer Feb 3, 2014
0192e25
Added utils.num2datetimeindex and collapse operators
shoyer Feb 4, 2014
900b61e
Simpler Dataset.views + working non-trivial indices + aggregated_by
shoyer Feb 5, 2014
0fbea90
Refactored backends for more consistency/flexibility
shoyer Feb 5, 2014
8c5ba8f
Added virtual variables
shoyer Feb 5, 2014
fdfd4ce
Added utils.Frozen
shoyer Feb 5, 2014
0e87356
Moved data.py to dataset.py
shoyer Feb 5, 2014
0354aa0
Fixed iterator and removed broken methods
shoyer Feb 5, 2014
f626652
Renamed views & loc_views to indexed_by and labeled_by
shoyer Feb 6, 2014
883f0fe
More flexible __getitem__ and __setitem__
shoyer Feb 6, 2014
c4794c5
Patched in more numpy methods
shoyer Feb 6, 2014
d517bd1
Faster aggregate
shoyer Feb 6, 2014
ba7083d
More array interface tests for ufuncs
shoyer Feb 6, 2014
dcdabee
Fixed DataView.from_stack
shoyer Feb 7, 2014
697f135
Initial docs with Sphinx
shoyer Feb 7, 2014
941e833
`to_dataframe` method implements pandas.DataFrame export
shoyer Feb 7, 2014
d6c0b82
New README.md
shoyer Feb 7, 2014
fb3289d
Fixed intersection
shoyer Feb 7, 2014
b5ff8d9
Renamed DataView.replace_focus to DataView.refocus
shoyer Feb 7, 2014
3c5856f
README edits
shoyer Feb 7, 2014
828c6dd
Indexing bug fixes
shoyer Feb 8, 2014
ef5ac51
Fully tranverse dataset graphs with Dataset.select
shoyer Feb 8, 2014
a4b6ad9
DataView.from_stack can concatenate along existing dimensions, too
shoyer Feb 9, 2014
f683b07
Added variable.T as shortcut for variable.transpose()
shoyer Feb 10, 2014
49edf3d
Added Variable.apply and DataView.apply
shoyer Feb 10, 2014
1e2c47a
Revised and extended new README
shoyer Feb 10, 2014
4cd1361
Renamed "Variable" -> "Array" and "DataView" -> "DatasetArray"
shoyer Feb 10, 2014
ad9a913
Array.groupby
shoyer Feb 13, 2014
f08e2eb
Renamed package from 'scidata' to 'xray'
shoyer Feb 14, 2014
cf3d6e2
Updated required numpy version to 1.8
shoyer Feb 15, 2014
0bd4c29
Added TODO notes
shoyer Feb 15, 2014
0af583e
Simplified Dataset
shoyer Feb 16, 2014
dd81b0f
Added utils.datetimeindex2num
shoyer Feb 16, 2014
782a933
Reworked virtual variables
shoyer Feb 16, 2014
e8738db
Fix performance regression in array_._as_compatible_data
shoyer Feb 16, 2014
508e16f
Speedup orthogonal indexing
shoyer Feb 16, 2014
0547123
Documentation and name cleanup
shoyer Feb 16, 2014
039bdd0
Better tests for GroupBy; bug fixes
shoyer Feb 16, 2014
fdba77f
Removed aggregate and iteartor (they are replaced by groupby)
shoyer Feb 16, 2014
1ab3f4d
to_dataframe() no longer creates a large empty array
shoyer Feb 19, 2014
d8abfd3
added DatasetArray.to_series() method
shoyer Feb 20, 2014
3f5bea2
added unused context argument to __array_wrap__
shoyer Feb 20, 2014
9488463
Update setup.py
akleeman Feb 21, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,5 @@ nosetests.xml
.mr.developer.cfg
.project
.pydevproject

doc/_build
97 changes: 76 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,76 @@
scidata
=======

Objects for holding self describing scientific data in python. The goal of this project is to
provide a Common Data Model (http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/)
allowing users to read write and manipulate netcdf-like data without worrying about where the data
source lives. A dataset that is too large to fit in memory, served from an OpenDAP server, streamed
or stored as NetCDF3, NetCDF4, grib (?), HDF5 and others can all be inspected and manipulated using
the same methods.

Of course there are already several packages in python that offer similar functionality (netCDF4,
scipy.io, pupynere, iris, ... ) but each of those packages have their own shortcomings:

netCDF4
Doesn't allow streaming. If you want to create a new object it needs to live on disk.
scipy.io / pupynere
Only works with NetCDF3 and doesn't support DAP making it difficult to work with large datasets.
iris
is REALLY close to what this project will provide, but iris strays further from the CDM,
than I would like. (if you read then write a netcdf file using iris all global attributes
are pushed down to variable level attributes.
# xray: transparently manipulate scientific datasets in Python

**xray** is a Python package for working with aligned sets of homogeneous,
n-dimensional arrays. It implements flexible array operations and dataset
manipulation for in-memory datasets within the [Common Data Model][cdm] widely
used for self-describing scientific data (netCDF, OpenDAP, etc.).

***Warning: xray is still in its early development phase. Expect the API to
change.***

## Main Feaures

- A `DatasetArray` object that is compatible with NumPy's ndarray and ufuncs
but keeps ancilliary variables and metadata intact.
- Array broadcasting based on dimension names and coordinate indices
instead of only shapes.
- Flexible split-apply-combine functionality with the `Array.groupby` method
(patterned after [pandas][pandas]).
- Fast label-based indexing and (limited) time-series functionality built on
[pandas][pandas].

## Design Goals

- Provide a data analysis toolkit as fast and powerful as pandas but
designed for working with datasets of aligned, homogeneous N-dimensional
arrays.
- Whenever possible, build on top of and interoperate with pandas and the
rest of the awesome [scientific python stack][scipy].
- Provide a uniform API for loading and saving scientific data in a variety
of formats (including streaming data).
- Use metadata according to [conventions][cf] when appropriate, but don't
strictly enforce them. Conflicting attributes (e.g., units) should be
silently dropped instead of causing errors. The onus is on the user to
make sure that operations make sense.

## Prior Art

- [Iris][iris] (supported by the UK Met office) is a similar package
designed for working with geophysical datasets in Python. Iris provided
much of the inspiration for xray (e.g., xray's `DatasetArray` is largely
based on the Iris `Cube`), but it has several limitations that led us to
build xray instead of extending Iris:
1. Iris has essentially one first-class object (the `Cube`) on which it
attempts to build all functionality (`Coord` supports a much more
limited set of functionality). xray has its equivalent of the Cube
(the `DatasetArray` object), but it is only a thin wrapper on the more
primitive building blocks of Dataset and Array objects.
2. Iris has a strict interpretation of [CF conventions][cf], which,
although a principled choice, we have found to be impractical for
everyday uses. With Iris, every quantity has physical (SI) units, all
coordinates have cell-bounds, and all metadata (units, cell-bounds and
other attributes) is required to match before merging or doing
operations with on multiple cubes. This means that a lot of time with
Iris is spent figuring out why cubes are incompatible and explicitly
removing possibly conflicting metadata.
3. Iris can be slow and complex. Strictly interpretting metadata requires
a lot of work and (in our experience) can be difficult to build mental
models of how Iris functions work. Moreover, it means that a lot of
logic (e.g., constraint handling) uses non-vectorized operations. For
example, extracting all times within a range can be surprisingly slow
(e.g., 0.3 seconds vs 3 milliseconds in xray to select along a time
dimension with 10000 elements).
- [pandas][pandas] is fast and powerful but oriented around working with
tabular datasets. pandas has experimental N-dimensional panels, but they
don't support aligned math with other objects. We believe the
`DatasetArray`/ `Cube` model is better suited to working with scientific
datasets. We use pandas internally in xray to support fast indexing.
- [netCDF4-python][nc4] provides xray's primary interface for working with
netCDF and OpenDAP datasets.

[pandas]: http://pandas.pydata.org/
[cdm]: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/
[cf]: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html
[scipy]: http://scipy.org/
[nc4]: http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.html
[iris]: http://scitools.org.uk/iris/
177 changes: 177 additions & 0 deletions doc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build

# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif

# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .

.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext

help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"

clean:
rm -rf $(BUILDDIR)/*

html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."

singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."

pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."

json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."

htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."

qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/scidata.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/scidata.qhc"

devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/scidata"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/scidata"
@echo "# devhelp"

epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."

latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."

latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."

man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."

texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."

info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."

gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."

changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."

linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."

doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."

xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."

pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
Loading