Skip to content

Commit 10f557f

Browse files
committed
ENH: this is a pipe
1 parent 5686152 commit 10f557f

File tree

7 files changed

+218
-41
lines changed

7 files changed

+218
-41
lines changed

doc/source/basics.rst

+74
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,77 @@ We can also pass infinite values to define the bins:
624624
Function application
625625
--------------------
626626

627+
To apply your own or another library's functions to pandas objects,
628+
you should be aware of the three methods below. The appropriate
629+
method to use depends on whether your function expects to operate
630+
on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
631+
632+
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
633+
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
634+
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
635+
636+
.. _basics.pipe:
637+
638+
Tablewise Function Application
639+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
640+
641+
.. versionadded:: 0.16.2
642+
643+
``DataFrames`` and ``Series`` can of course just be passed into functions.
644+
However, if the function needs to be called in a chain, consider using the :meth:`~DataFrame.pipe` method.
645+
Compare the following
646+
647+
.. code-block:: python
648+
649+
# f, g, and h are functions taking and returning ``DataFrames``
650+
>>> f(g(h(df), arg1=1), arg2=2, arg3=3)
651+
652+
with the equivalent
653+
654+
.. code-block:: python
655+
656+
>>> (df.pipe(h)
657+
.pipe(g, arg1=1)
658+
.pipe(f, arg2=2, arg3=3)
659+
)
660+
661+
Pandas encourages the second style, which is known as method chaining.
662+
``pipe`` makes it easy to use your own or another library's functions
663+
in method chains, alongside pandas' methods.
664+
665+
In the example above, the functions ``f``, ``g``, and ``h`` each expected the ``DataFrame`` as the first positional argument.
666+
What if the function you wish to apply takes its data as, say, the second argument?
667+
In this case, provide ``pipe`` with a tuple of ``(callable, data_keyword)``.
668+
``.pipe`` will route the ``DataFrame`` to the argument specified in the tuple.
669+
670+
For example, we can fit a regression using statsmodels. Their API expects a formula first and a ``DataFrame`` as the second argument, ``data``. We pass in the function, keyword pair ``(sm.poisson, 'data')`` to ``pipe``:
671+
672+
.. ipython:: python
673+
674+
import statsmodels.formula.api as sm
675+
676+
bb = pd.read_csv('data/baseball.csv', index_col='id')
677+
678+
(bb.query('h > 0')
679+
.assign(ln_h = lambda df: np.log(df.h))
680+
.pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
681+
.fit()
682+
.summary()
683+
)
684+
685+
The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which
686+
have introduced the popular ``(%>%)`` (read pipe) operator for R_.
687+
The implementation of ``pipe`` here is quite clean and feels right at home in python.
688+
We encourage you to view the source code (``pd.DataFrame.pipe??`` in IPython).
689+
690+
.. _dplyr: https://github.com/hadley/dplyr
691+
.. _magrittr: https://github.com/smbache/magrittr
692+
.. _R: http://www.r-project.org
693+
694+
695+
Row or Column-wise Function Application
696+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
697+
627698
Arbitrary functions can be applied along the axes of a DataFrame or Panel
628699
using the :meth:`~DataFrame.apply` method, which, like the descriptive
629700
statistics methods, take an optional ``axis`` argument:
@@ -678,6 +749,7 @@ Series operation on each column or row:
678749
tsdf
679750
tsdf.apply(pd.Series.interpolate)
680751
752+
681753
Finally, :meth:`~DataFrame.apply` takes an argument ``raw`` which is False by default, which
682754
converts each row or column into a Series before applying the function. When
683755
set to True, the passed function will instead receive an ndarray object, which
@@ -690,6 +762,8 @@ functionality.
690762
functionality for grouping by some criterion, applying, and combining the
691763
results into a Series, DataFrame, etc.
692764

765+
.. _Elementwise:
766+
693767
Applying elementwise Python functions
694768
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
695769

doc/source/faq.rst

-40
Original file line numberDiff line numberDiff line change
@@ -89,46 +89,6 @@ representation; i.e., 1KB = 1024 bytes).
8989

9090
See also :ref:`Categorical Memory Usage <categorical.memory>`.
9191

92-
.. _ref-monkey-patching:
93-
94-
Adding Features to your pandas Installation
95-
-------------------------------------------
96-
97-
pandas is a powerful tool and already has a plethora of data manipulation
98-
operations implemented, most of them are very fast as well.
99-
It's very possible however that certain functionality that would make your
100-
life easier is missing. In that case you have several options:
101-
102-
1) Open an issue on `Github <https://github.com/pydata/pandas/issues/>`__ , explain your need and the sort of functionality you would like to see implemented.
103-
2) Fork the repo, Implement the functionality yourself and open a PR
104-
on Github.
105-
3) Write a method that performs the operation you are interested in and
106-
Monkey-patch the pandas class as part of your IPython profile startup
107-
or PYTHONSTARTUP file.
108-
109-
For example, here is an example of adding an ``just_foo_cols()``
110-
method to the dataframe class:
111-
112-
::
113-
114-
import pandas as pd
115-
def just_foo_cols(self):
116-
"""Get a list of column names containing the string 'foo'
117-
118-
"""
119-
return [x for x in self.columns if 'foo' in x]
120-
121-
pd.DataFrame.just_foo_cols = just_foo_cols # monkey-patch the DataFrame class
122-
df = pd.DataFrame([list(range(4))], columns=["A","foo","foozball","bar"])
123-
df.just_foo_cols()
124-
del pd.DataFrame.just_foo_cols # you can also remove the new method
125-
126-
127-
Monkey-patching is usually frowned upon because it makes your code
128-
less portable and can cause subtle bugs in some circumstances.
129-
Monkey-patching existing methods is usually a bad idea in that respect.
130-
When used with proper care, however, it's a very useful tool to have.
131-
13292

13393
.. _ref-scikits-migration:
13494

doc/source/whatsnew/v0.16.2.txt

+51
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ We recommend that all users upgrade to this version.
1010
Highlights include:
1111

1212
- Documentation on how to use ``numba`` with *pandas*, see :ref:`here <enhancingperf.numba>`
13+
- A new ``pipe`` method, :ref:`here <basics.pipe>`
1314

1415
Check the :ref:`API Changes <whatsnew_0162.api>` before updating.
1516

@@ -22,6 +23,56 @@ Check the :ref:`API Changes <whatsnew_0162.api>` before updating.
2223
New features
2324
~~~~~~~~~~~~
2425

26+
We've introduced a new method :meth:`DataFrame.pipe`. As suggested by the name, ``pipe``
27+
should be used to pipe data through a chain of function calls.
28+
The goal is to avoid confusing nested function calls like
29+
30+
.. code-block:: python
31+
32+
# df is a DataFrame, f, g, and h are functions taking and returing DataFrames
33+
f(g(h(df), arg1=1), arg2=2, arg3=3)
34+
35+
The logic flows from inside out, and function names are separated from their keyword arguments.
36+
This can be rewritten as
37+
38+
.. code-block:: python
39+
40+
(df.pipe(h)
41+
.pipe(g, arg1=1)
42+
.pipe(f, arg2=2)
43+
)
44+
45+
Now the both the code and the logic flow from top to bottom. Keyword arguments are next to
46+
their functions. Overall the code is much more readable.
47+
48+
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
49+
When the funciton you wish to apply takes its data anywhere other than the first argument, pass a tuple
50+
of ``(funciton, keyword)`` indicating where the DataFrame should flow. For example:
51+
52+
.. ipython:: python
53+
54+
import statsmodels.formula.api as sm
55+
56+
bb = pd.read_csv('data/baseball.csv', index_col='id')
57+
58+
# sm.poisson takes (formula, data)
59+
(bb.query('h > 0')
60+
.assign(ln_h = lambda df: np.log(df.h))
61+
.pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
62+
.fit()
63+
.summary()
64+
)
65+
66+
The pipe method is inspired by unix pipes, which stream text through
67+
processes. More recently dplyr_ and magrittr_ have introduced the
68+
popular ``(%>%)`` pipe operator for R_.
69+
70+
See the :ref:`documentation <basics.pipe>` for more. (:issue:`10129`)
71+
72+
.. _dplyr: https://github.com/hadley/dplyr
73+
.. _magrittr: https://github.com/smbache/magrittr
74+
.. _R: http://www.r-project.org
75+
2576
.. _whatsnew_0162.enhancements.other:
2677

2778
Other enhancements

doc/source/whatsnew/v0.17.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsne
2121
New features
2222
~~~~~~~~~~~~
2323

24+
2425
.. _whatsnew_0170.enhancements.other:
2526

2627
Other enhancements

pandas/__init__.py

-1
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,3 @@
5757
from pandas.util.print_versions import show_versions
5858
import pandas.util.testing
5959

60-

pandas/core/generic.py

+50
Original file line numberDiff line numberDiff line change
@@ -2044,6 +2044,56 @@ def sample(self, n=None, frac=None, replace=False, weights=None, random_state=No
20442044
locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
20452045
return self.take(locs, axis=axis)
20462046

2047+
_shared_docs['pipe'] = ("""
2048+
Apply func(self, *args, **kwargs)
2049+
2050+
.. versionadded:: 0.16.2
2051+
2052+
Parameters
2053+
----------
2054+
func : function
2055+
function to apply to the %(klass)s.
2056+
``args``, and ``kwargs`` are passed into ``func``.
2057+
Alternatively a ``(callable, data_keyword)`` tuple where
2058+
``data_keyword`` is a string indicating the keyword of
2059+
``callable`` that expects the %(klass)s.
2060+
args : positional arguments passed into ``func``
2061+
kwargs : a dictionary of keyword arguments passed into ``func``.
2062+
2063+
Returns
2064+
-------
2065+
object : whatever the return type of ``func`` is.
2066+
2067+
Notes
2068+
-----
2069+
2070+
Use ``.pipe`` when chaining together functions that expect
2071+
on Series or DataFrames. Instead of writing
2072+
2073+
>>> f(g(h(df), arg1=a), arg2=b, arg3=c)
2074+
2075+
You can write
2076+
2077+
>>> (df.pipe(h)
2078+
.pipe(g, arg1=a)
2079+
.pipe(f, arg2=b, arg3=c)
2080+
)
2081+
2082+
See Also
2083+
--------
2084+
pandas.DataFrame.apply
2085+
pandas.DataFrame.applymap
2086+
pandas.Series.map
2087+
"""
2088+
)
2089+
@Appender(_shared_docs['pipe'] % _shared_doc_kwargs)
2090+
def pipe(self, func, *args, **kwargs):
2091+
if isinstance(func, tuple):
2092+
func, target = func
2093+
kwargs[target] = self
2094+
return func(*args, **kwargs)
2095+
else:
2096+
return func(self, *args, **kwargs)
20472097

20482098
#----------------------------------------------------------------------
20492099
# Attribute access

pandas/tests/test_generic.py

+42
Original file line numberDiff line numberDiff line change
@@ -1649,6 +1649,48 @@ def test_describe_raises(self):
16491649
with tm.assertRaises(NotImplementedError):
16501650
tm.makePanel().describe()
16511651

1652+
def test_pipe(self):
1653+
df = DataFrame({'A': [1, 2, 3]})
1654+
f = lambda x, y: x ** y
1655+
result = df.pipe(f, 2)
1656+
expected = DataFrame({'A': [1, 4, 9]})
1657+
self.assert_frame_equal(result, expected)
1658+
1659+
result = df.A.pipe(f, 2)
1660+
self.assert_series_equal(result, expected.A)
1661+
1662+
def test_pipe_tuple(self):
1663+
df = DataFrame({'A': [1, 2, 3]})
1664+
f = lambda x, y: y
1665+
result = df.pipe((f, 'y'), 0)
1666+
self.assert_frame_equal(result, df)
1667+
1668+
result = df.A.pipe((f, 'y'), 0)
1669+
self.assert_series_equal(result, df.A)
1670+
1671+
def test_pipe_tuple_error(self):
1672+
df = DataFrame({"A": [1, 2, 3]})
1673+
f = lambda x, y: y
1674+
with tm.assertRaises(ValueError):
1675+
result = df.pipe((f, 'y'), x=1, y=0)
1676+
1677+
with tm.assertRaises(ValueError):
1678+
result = df.A.pipe((f, 'y'), x=1, y=0)
1679+
1680+
def test_pipe_panel(self):
1681+
wp = Panel({'r1': DataFrame({"A": [1, 2, 3]})})
1682+
f = lambda x, y: x + y
1683+
result = wp.pipe(f, 2)
1684+
expected = wp + 2
1685+
assert_panel_equal(result, expected)
1686+
1687+
result = wp.pipe((f, 'y'), x=1)
1688+
expected = wp + 1
1689+
assert_panel_equal(result, expected)
1690+
1691+
with tm.assertRaises(ValueError):
1692+
result = wp.pipe((f, 'y'), x=1, y=1)
1693+
16521694
if __name__ == '__main__':
16531695
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
16541696
exit=False)

0 commit comments

Comments
 (0)