Skip to content

Commit cf810b3

Browse files
committed
ENH: this is a pipe
1 parent 676cb95 commit cf810b3

File tree

8 files changed

+185
-2
lines changed

8 files changed

+185
-2
lines changed

doc/source/basics.rst

+63
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,66 @@ We can also pass infinite values to define the bins:
624624
Function application
625625
--------------------
626626

627+
There are three main cases for function, depending on what the function
628+
is expecting. Pandas correspondingly offers a method for each case:
629+
630+
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
631+
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
632+
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
633+
634+
.. _pipe:
635+
636+
Tablewise Function Application
637+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
638+
639+
.. versionadded:: 0.16.2
640+
641+
DataFrames and Series can of course just be passed into functions.
642+
However, if the function needs to be called in a chain, consider using the :meth:`~DataFrame.pipe` method.
643+
Compare the following
644+
645+
.. code-block:: python
646+
647+
# f, g, and h are functions taking and returning DataFrames
648+
>>> f(g(h(df), arg1=1), arg2=2, arg3=3)
649+
650+
with the equivalent
651+
652+
.. code-block:: python
653+
654+
>>> (df.pipe(h),
655+
.pipe(g, arg1=1),
656+
.pipe(f, arg2=2, arg3=3)
657+
)
658+
659+
Pandas encourages the second style. It flows with the rest of pandas
660+
methods which return DataFrames or Series and are non-mutating by
661+
default.
662+
663+
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
664+
What if the function you wish to apply takes its data as, say, the second argument?
665+
In this case, provide ``pipe`` with a tuple of ``(callable, data_keyword)``.
666+
``.pipe`` will route the DataFrame to the argument specified in the tuple.
667+
668+
For example, we can fit a regression using statsmodels. Their API expects a formula first and a DataFrame as the second argument, ``data``. We pass in the function, keyword pair ``(sm.poisson, 'data')`` to ``pipe``:
669+
670+
.. ipython:: python
671+
672+
import statsmodels.formula.api as sm
673+
674+
bb = pd.read_csv('data/baseball.csv', index_col='id')
675+
676+
(bb.query('h > 0')
677+
.assign(ln_h = lambda df: np.log(df.h))
678+
.pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
679+
.fit()
680+
.summary()
681+
)
682+
683+
684+
Row or Column-wise Function Application
685+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
686+
627687
Arbitrary functions can be applied along the axes of a DataFrame or Panel
628688
using the :meth:`~DataFrame.apply` method, which, like the descriptive
629689
statistics methods, take an optional ``axis`` argument:
@@ -678,6 +738,7 @@ Series operation on each column or row:
678738
tsdf
679739
tsdf.apply(pd.Series.interpolate)
680740
741+
681742
Finally, :meth:`~DataFrame.apply` takes an argument ``raw`` which is False by default, which
682743
converts each row or column into a Series before applying the function. When
683744
set to True, the passed function will instead receive an ndarray object, which
@@ -690,6 +751,8 @@ functionality.
690751
functionality for grouping by some criterion, applying, and combining the
691752
results into a Series, DataFrame, etc.
692753

754+
.. _Elementwise:
755+
693756
Applying elementwise Python functions
694757
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
695758

doc/source/whatsnew/v0.16.2.txt

+41
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,47 @@ Check the :ref:`API Changes <whatsnew_0162.api>` before updating.
2020
New features
2121
~~~~~~~~~~~~
2222

23+
We've introduced a new method :meth:`DataFrame.pipe`. As suggested by the name, ``pipe``
24+
should be used to pipe data through a chain of function calls.
25+
The goal is to avoid confusing nested function calls like
26+
27+
.. code-block:: python
28+
29+
# df is a DataFrame, f, g, and h are functions taking and returing DataFrames
30+
f(g(h(df), arg1=1), arg2=2, arg3=3)
31+
32+
The logic flows from inside out, and function names are separated from their keyword arguments.
33+
This can be rewritten as
34+
35+
.. code-block:: python
36+
37+
(df.pipe(h)
38+
.pipe(g, arg1=1)
39+
.pipe(f, arg2=2)
40+
)
41+
42+
Now the both the code and the logic flow from top to bottom. Keyword arguments are next to
43+
their functions. Overall the code is much more readable.
44+
45+
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument. When the funciton you wish to apply takes its data anywhere other than the
46+
frist argument, pass a tuple of ``(funciton, keyword)`` indicating where the DataFrame should flow.For example:
47+
48+
.. ipython:: python
49+
50+
import statsmodels.formula.api as sm
51+
52+
bb = pd.read_csv('data/baseball.csv', index_col='id')
53+
54+
# sm.poisson takes (formula, data)
55+
(bb.query('h > 0')
56+
.assign(ln_h = lambda df: np.log(df.h))
57+
.pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
58+
.fit()
59+
.summary()
60+
)
61+
62+
See the :ref:`documentation <basics.pipe>` for more. (:issue:`10129`)
63+
2364
.. _whatsnew_0162.enhancements.other:
2465

2566
Other enhancements

doc/source/whatsnew/v0.17.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsne
2121
New features
2222
~~~~~~~~~~~~
2323

24+
2425
.. _whatsnew_0170.enhancements.other:
2526

2627
Other enhancements

pandas/__init__.py

-1
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,3 @@
5757
from pandas.util.print_versions import show_versions
5858
import pandas.util.testing
5959

60-

pandas/core/generic.py

+49
Original file line numberDiff line numberDiff line change
@@ -2044,6 +2044,55 @@ def sample(self, n=None, frac=None, replace=False, weights=None, random_state=No
20442044
locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
20452045
return self.take(locs, axis=axis)
20462046

2047+
_shared_docs['pipe'] = ("""
2048+
Apply func(self, *args, **kwargs)
2049+
2050+
.. versionadded:: 0.16.2
2051+
2052+
Parameters
2053+
----------
2054+
func : function
2055+
function to apply to the %(klass)s
2056+
``args``, and ``kwargs`` are passed into ``func``.
2057+
Alternatively a ``(callable, data_keyword)`` pair where
2058+
``data_keyword`` is a string indicating the keyword of
2059+
``callable`` that expects the %(klass)s.
2060+
args : positional arguments passed into ``func``
2061+
kwargs : a dictionary of keyword arguments passed into ``func``.
2062+
2063+
Returns
2064+
-------
2065+
object : whatever the return type of ``func`` is.
2066+
2067+
Notes
2068+
-----
2069+
2070+
Use ``.pipe`` when chaining together functions that operate
2071+
on or rerturn Series or DataFrames. Instead of writing
2072+
2073+
>>> f(g(h(df), arg1=a), arg2=b, arg3=c)
2074+
2075+
You can write
2076+
2077+
>>> (df.pipe(h)
2078+
.pipe(g, arg1=a)
2079+
.pipe(f, arg2=b, arg3=c)
2080+
)
2081+
2082+
See Also
2083+
--------
2084+
pandas.DataFrame.apply
2085+
pandas.DataFrame.applymap
2086+
pandas.Series.map
2087+
""")
2088+
@Appender(_shared_docs['pipe'] % _shared_doc_kwargs)
2089+
def pipe(self, func, *args, **kwargs):
2090+
if isinstance(func, tuple):
2091+
func, target = func
2092+
kwargs[target] = self
2093+
return func(*args, **kwargs)
2094+
else:
2095+
return func(self, *args, **kwargs)
20472096

20482097
#----------------------------------------------------------------------
20492098
# Attribute access

pandas/tests/test_generic.py

+19
Original file line numberDiff line numberDiff line change
@@ -1649,6 +1649,25 @@ def test_describe_raises(self):
16491649
with tm.assertRaises(NotImplementedError):
16501650
tm.makePanel().describe()
16511651

1652+
def test_pipe(self):
1653+
df = DataFrame({'A': [1, 2, 3]})
1654+
f = lambda x, y: x ** y
1655+
result = df.pipe(f, 2)
1656+
expected = DataFrame({'A': [1, 4, 9]})
1657+
self.assert_frame_equal(result, expected)
1658+
1659+
result = df.A.pipe(f, 2)
1660+
self.assert_series_equal(result, expected.A)
1661+
1662+
def test_pipe_tuple(self):
1663+
df = DataFrame({'A': [1, 2, 3]})
1664+
f = lambda x, y: y
1665+
result = df.pipe((f, 'y'), 0)
1666+
self.assert_frame_equal(result, df)
1667+
1668+
result = df.A.pipe((f, 'y'), 0)
1669+
self.assert_series_equal(result, df.A)
1670+
16521671
if __name__ == '__main__':
16531672
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
16541673
exit=False)

pandas/tests/test_util.py

+11-1
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@
55

66
import sys
77
import pandas.util
8-
from pandas.util.decorators import deprecate_kwarg
8+
from pandas.util.decorators import deprecate_kwarg, pipeable
99
import pandas.util.testing as tm
10+
from pandas import DataFrame
1011

1112
class TestDecorators(tm.TestCase):
1213
def setUp(self):
@@ -60,6 +61,15 @@ def test_bad_deprecate_kwarg(self):
6061
def f4(new=None):
6162
pass
6263

64+
def test_pipeable(self):
65+
@pipeable("data")
66+
def f(x, y, data=None):
67+
return data
68+
self.assertTrue(hasattr(f, 'pipe_arg'))
69+
70+
df = DataFrame({'A': [1, 2]})
71+
self.assert_frame_equal(df, df.pipe(f, 1, 2))
72+
6373

6474
class TestTesting(tm.TestCase):
6575

pandas/util/decorators.py

+1
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,7 @@ def knownfailureif(fail_condition, msg=None):
239239
if msg is None:
240240
msg = 'Test skipped due to known failure'
241241

242+
242243
# Allow for both boolean or callable known failure conditions.
243244
if callable(fail_condition):
244245
fail_val = fail_condition

0 commit comments

Comments
 (0)