From 1545aa7f91942ad8529867e4c77d813e0cf756a5 Mon Sep 17 00:00:00 2001 From: jreback Date: Tue, 29 Apr 2014 18:13:51 -0400 Subject: [PATCH] DOC: add notes to the groupby.rst docs --- doc/source/groupby.rst | 49 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 8f06debb9742a..286ae10cfc5a8 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -344,8 +344,9 @@ Aggregation ----------- Once the GroupBy object has been created, several methods are available to -perform a computation on the grouped data. An obvious one is aggregation via -the ``aggregate`` or equivalently ``agg`` method: +perform a computation on the grouped data. + +An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method: .. ipython:: python @@ -382,6 +383,22 @@ index are the group names and whose values are the sizes of each group. grouped.size() +.. ipython:: python + + grouped.describe() + +.. note:: + + Aggregation functions will **not** return the groups that you are aggregating over + if they are named *columns*, when ``as_index=True``, the default. The grouped columns will + be the **indices** of the returned object. + + Aggregating functions are ones that reduce the dimension of the returned objects, + for example: ``mean, sum, size, count, std, var, describe, first, last, min, max``. This is + very much like performing a redcing operation on a ``DataFrame`` and getting a ``Series`` back. + + Passing ``as_index=False`` **will** return the groups that you are aggregative over if they are + named *columns*. .. _groupby.aggregate.multifunc: @@ -537,6 +554,16 @@ and that the transformed data contains no NAs. grouped_trans.count() # counts after transformation grouped_trans.size() # Verify non-NA count equals group size +.. note:: + + Some functions when applied to a groupby object will automatically transform the input, returning + an object of the same shape as the original. For example: ``fillna, ffill, bfill, shift``. + Passing ``as_index=False`` will not affect these transformation methods. + + .. ipython:: python + + grouped.ffill() + .. _groupby.filter: Filtration @@ -579,6 +606,18 @@ For dataframes with multiple columns, filters should explicitly specify a column dff['C'] = np.arange(8) dff.groupby('B').filter(lambda x: len(x['C']) > 2) +.. note:: + + Some functions when applied to a groupby object will act as a **filter** on the input, returning + a reduced shape of the original (and potentitally eliminating groups), but with the index unchanged. + Passing ``as_index=False`` will not affect these transformation methods. + For example: ``head, tail nth``. + + .. ipython:: python + + dff.groupby('B').head(2) + + .. _groupby.dispatch: Dispatching to instance methods @@ -664,6 +703,12 @@ The dimension of the returned result can also change: s.apply(f) +.. note:: + + ``apply`` can act as a reducer, transformer, *or* filter function, depending on exactly what is passed to apply. + So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in + the output as well as set the indices. + .. warning:: In the current implementation apply calls func twice on the