From d5663830d57d76f4a0c68c9971626e4d03406546 Mon Sep 17 00:00:00 2001 From: tp Date: Tue, 29 Aug 2017 23:36:54 +0100 Subject: [PATCH 1/5] Cleaned references to versions <0.12 in docs --- doc/source/basics.rst | 8 ++++---- doc/source/dsintro.rst | 7 +++---- doc/source/groupby.rst | 4 +--- doc/source/indexing.rst | 2 -- doc/source/io.rst | 14 +++++++------- doc/source/missing_data.rst | 9 ++++----- doc/source/timeseries.rst | 3 +-- doc/source/visualization.rst | 6 ------ 8 files changed, 20 insertions(+), 33 deletions(-) diff --git a/doc/source/basics.rst b/doc/source/basics.rst index fe20a7eb2b786..b01ca96a611f2 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -251,8 +251,8 @@ replace NaN with some other value using ``fillna`` if you wish). Flexible Comparisons ~~~~~~~~~~~~~~~~~~~~ -Starting in v0.8, pandas introduced binary comparison methods eq, ne, lt, gt, -le, and ge to Series and DataFrame whose behavior is analogous to the binary +Note that Series and DataFrame have the binary comparison methods eq, ne, lt, gt, +le, and ge whose behavior is analogous to the binary arithmetic operations described above: .. ipython:: python @@ -1908,7 +1908,7 @@ each type in a ``DataFrame``: dft.get_dtype_counts() -Numeric dtypes will propagate and can coexist in DataFrames (starting in v0.11.0). +Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste. @@ -2137,7 +2137,7 @@ gotchas ~~~~~~~ Performing selection operations on ``integer`` type data can easily upcast the data to ``floating``. -The dtype of the input data will be preserved in cases where ``nans`` are not introduced (starting in 0.11.0) +The dtype of the input data will be preserved in cases where ``nans`` are not introduced. See also :ref:`Support for integer NA ` .. ipython:: python diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index 3c6572229802d..e5add0639432d 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -73,7 +73,7 @@ index is passed, one will be created having values ``[0, ..., len(data) - 1]``. .. note:: - Starting in v0.8.0, pandas supports non-unique index values. If an operation + pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time. The reason for being lazy is nearly all performance-based (there are many instances in computations, like parts of GroupBy, where the index @@ -698,7 +698,7 @@ DataFrame in tabular form, though it won't always fit the console width: print(baseball.iloc[-20:, :12].to_string()) -New since 0.10.0, wide DataFrames will now be printed across multiple rows by +Note that wide DataFrames will be printed across multiple rows by default: .. ipython:: python @@ -856,8 +856,7 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to From DataFrame using ``to_panel`` method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This method was introduced in v0.7 to replace ``LongPanel.to_long``, and converts -a DataFrame with a two-level index to a Panel. +``to_panel`` converts a DataFrame with a two-level index to a Panel. .. ipython:: python :okwarning: diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 937d682d238b3..0be60d2301b6b 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -140,7 +140,7 @@ columns: In [5]: grouped = df.groupby(get_letter_type, axis=1) -Starting with 0.8, pandas Index objects now support duplicate values. If a +Note that pandas Index objects support duplicate values. If a non-unique index is used as the group key in a groupby operation, all values for the same index value will be considered to be in one group and thus the output of aggregation functions will only contain unique index values: @@ -288,8 +288,6 @@ chosen level: s.sum(level='second') -.. versionadded:: 0.6 - Grouping with multiple levels is supported. .. ipython:: python diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index 53a259ad6eb15..4687e46490562 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -66,8 +66,6 @@ See the :ref:`cookbook` for some advanced strategies Different Choices for Indexing ------------------------------ -.. versionadded:: 0.11.0 - Object selection has had a number of user-requested additions in order to support more explicit location based indexing. Pandas now supports three types of multi-axis indexing. diff --git a/doc/source/io.rst b/doc/source/io.rst index e338407361705..5be68a93f8e3f 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -364,7 +364,7 @@ warn_bad_lines : boolean, default ``True`` Specifying column data types '''''''''''''''''''''''''''' -Starting with v0.10, you can indicate the data type for the whole DataFrame or +You can indicate the data type for the whole DataFrame or individual columns: .. ipython:: python @@ -3346,7 +3346,7 @@ Read/Write API '''''''''''''' ``HDFStore`` supports an top-level API using ``read_hdf`` for reading and ``to_hdf`` for writing, -similar to how ``read_csv`` and ``to_csv`` work. (new in 0.11.0) +similar to how ``read_csv`` and ``to_csv`` work. .. ipython:: python @@ -3791,7 +3791,7 @@ indexed dimension as the ``where``. .. note:: - Indexes are automagically created (starting ``0.10.1``) on the indexables + Indexes are automagically created on the indexables and any data columns you specify. This behavior can be turned off by passing ``index=False`` to ``append``. @@ -3878,7 +3878,7 @@ create a new table!) Iterator ++++++++ -Starting in ``0.11.0``, you can pass, ``iterator=True`` or ``chunksize=number_in_a_chunk`` +Note that you can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` to ``select`` and ``select_as_multiple`` to return an iterator on the results. The default is 50,000 rows returned in a chunk. @@ -3986,8 +3986,8 @@ of rows in an object. Multiple Table Queries ++++++++++++++++++++++ -New in 0.10.1 are the methods ``append_to_multiple`` and -``select_as_multiple``, that can perform appending/selecting from +The methods ``append_to_multiple`` and +``select_as_multiple`` can perform appending/selecting from multiple tables at once. The idea is to have one table (call it the selector table) that you index most/all of the columns, and perform your queries. The other table(s) are data tables with an index matching the @@ -4291,7 +4291,7 @@ Pass ``min_itemsize`` on the first table creation to a-priori specify the minimu ``min_itemsize`` can be an integer, or a dict mapping a column name to an integer. You can pass ``values`` as a key to allow all *indexables* or *data_columns* to have this min_itemsize. -Starting in 0.11.0, passing a ``min_itemsize`` dict will cause all passed columns to be created as *data_columns* automatically. +Passing a ``min_itemsize`` dict will cause all passed columns to be created as *data_columns* automatically. .. note:: diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index d54288baa389b..3ad08f6819642 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -67,9 +67,8 @@ arise and we wish to also consider that "missing" or "not available" or "NA". .. note:: - Prior to version v0.10.0 ``inf`` and ``-inf`` were also - considered to be "NA" in computations. This is no longer the case by - default; use the ``mode.use_inf_as_na`` option to recover it. + If you want to consider ``inf`` and ``-inf`` + to be "NA" in computations, you can use the ``mode.use_inf_as_na`` option to archieve it. .. _missing.isna: @@ -485,8 +484,8 @@ respectively: Replacing Generic Values ~~~~~~~~~~~~~~~~~~~~~~~~ -Often times we want to replace arbitrary values with other values. New in v0.8 -is the ``replace`` method in Series/DataFrame that provides an efficient yet +Often times we want to replace arbitrary values with other values. The +``replace`` method in Series/DataFrame provides an efficient yet flexible way to perform such replacements. For a Series, you can replace a single value or a list of values by another diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index ce4a920ad77b5..aded5e4402df2 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -1069,8 +1069,7 @@ Offset Aliases ~~~~~~~~~~~~~~ A number of string aliases are given to useful common time series -frequencies. We will refer to these aliases as *offset aliases* -(referred to as *time rules* prior to v0.8.0). +frequencies. We will refer to these aliases as *offset aliases*. .. csv-table:: :header: "Alias", "Description" diff --git a/doc/source/visualization.rst b/doc/source/visualization.rst index fb799c642131d..c637246537ca1 100644 --- a/doc/source/visualization.rst +++ b/doc/source/visualization.rst @@ -306,8 +306,6 @@ subplots: df.diff().hist(color='k', alpha=0.5, bins=50) -.. versionadded:: 0.10.0 - The ``by`` keyword can be specified to plot grouped histograms: .. ipython:: python @@ -831,8 +829,6 @@ and take a :class:`Series` or :class:`DataFrame` as an argument. Scatter Matrix Plot ~~~~~~~~~~~~~~~~~~~ -.. versionadded:: 0.7.3 - You can create a scatter plot matrix using the ``scatter_matrix`` method in ``pandas.plotting``: @@ -859,8 +855,6 @@ You can create a scatter plot matrix using the Density Plot ~~~~~~~~~~~~ -.. versionadded:: 0.8.0 - You can create density plots using the :meth:`Series.plot.kde` and :meth:`DataFrame.plot.kde` methods. .. ipython:: python From 6abd5cc31b391fef44d8f7df8a6a6ce244ad46b9 Mon Sep 17 00:00:00 2001 From: tp Date: Wed, 30 Aug 2017 12:00:18 +0100 Subject: [PATCH 2/5] Updated according to comments --- doc/source/basics.rst | 4 ++-- doc/source/dsintro.rst | 8 +++----- doc/source/groupby.rst | 2 +- doc/source/io.rst | 40 +--------------------------------------- 4 files changed, 7 insertions(+), 47 deletions(-) diff --git a/doc/source/basics.rst b/doc/source/basics.rst index b01ca96a611f2..35eb14eda238f 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -251,8 +251,8 @@ replace NaN with some other value using ``fillna`` if you wish). Flexible Comparisons ~~~~~~~~~~~~~~~~~~~~ -Note that Series and DataFrame have the binary comparison methods eq, ne, lt, gt, -le, and ge whose behavior is analogous to the binary +Series and DataFrame have the binary comparison methods ``eq``, ``ne``, ``lt``, ``gt``, +``le``, and ``ge`` whose behavior is analogous to the binary arithmetic operations described above: .. ipython:: python diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index e5add0639432d..4652ccbf0ad34 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -698,7 +698,7 @@ DataFrame in tabular form, though it won't always fit the console width: print(baseball.iloc[-20:, :12].to_string()) -Note that wide DataFrames will be printed across multiple rows by +Wide DataFrames will be printed across multiple rows by default: .. ipython:: python @@ -845,11 +845,9 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to .. note:: - Unfortunately Panel, being less commonly used than Series and DataFrame, + Panel, being less commonly used than Series and DataFrame, has been slightly neglected feature-wise. A number of methods and options - available in DataFrame are not available in Panel. This will get worked - on, of course, in future releases. And faster if you join me in working on - the codebase. + available in DataFrame are not available in Panel. .. _dsintro.to_panel: diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 0be60d2301b6b..53c0b771555f8 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -140,7 +140,7 @@ columns: In [5]: grouped = df.groupby(get_letter_type, axis=1) -Note that pandas Index objects support duplicate values. If a +pandas Index objects support duplicate values. If a non-unique index is used as the group key in a groupby operation, all values for the same index value will be considered to be in one group and thus the output of aggregation functions will only contain unique index values: diff --git a/doc/source/io.rst b/doc/source/io.rst index 5be68a93f8e3f..f68358764a40e 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -3878,7 +3878,7 @@ create a new table!) Iterator ++++++++ -Note that you can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` +You can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` to ``select`` and ``select_as_multiple`` to return an iterator on the results. The default is 50,000 rows returned in a chunk. @@ -4419,44 +4419,6 @@ Now you can import the ``DataFrame`` into R: starting point if you have stored multiple ``DataFrame`` objects to a single HDF5 file. -Backwards Compatibility -''''''''''''''''''''''' - -0.10.1 of ``HDFStore`` can read tables created in a prior version of pandas, -however query terms using the -prior (undocumented) methodology are unsupported. ``HDFStore`` will -issue a warning if you try to use a legacy-format file. You must -read in the entire file and write it out using the new format, using the -method ``copy`` to take advantage of the updates. The group attribute -``pandas_version`` contains the version information. ``copy`` takes a -number of options, please see the docstring. - - -.. ipython:: python - :suppress: - - import os - legacy_file_path = os.path.abspath('source/_static/legacy_0.10.h5') - -.. ipython:: python - :okwarning: - - # a legacy store - legacy_store = pd.HDFStore(legacy_file_path,'r') - legacy_store - - # copy (and return the new handle) - new_store = legacy_store.copy('store_new.h5') - new_store - new_store.close() - -.. ipython:: python - :suppress: - - legacy_store.close() - import os - os.remove('store_new.h5') - Performance ''''''''''' From 5bc8714144ede498bcbb545bb19222e3c95bf76a Mon Sep 17 00:00:00 2001 From: tp Date: Fri, 1 Sep 2017 01:10:32 +0100 Subject: [PATCH 3/5] Revert "Updated according to comments" This reverts commit 6abd5cc31b391fef44d8f7df8a6a6ce244ad46b9. --- doc/source/basics.rst | 4 ++-- doc/source/dsintro.rst | 8 +++++--- doc/source/groupby.rst | 2 +- doc/source/io.rst | 40 +++++++++++++++++++++++++++++++++++++++- 4 files changed, 47 insertions(+), 7 deletions(-) diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 35eb14eda238f..b01ca96a611f2 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -251,8 +251,8 @@ replace NaN with some other value using ``fillna`` if you wish). Flexible Comparisons ~~~~~~~~~~~~~~~~~~~~ -Series and DataFrame have the binary comparison methods ``eq``, ``ne``, ``lt``, ``gt``, -``le``, and ``ge`` whose behavior is analogous to the binary +Note that Series and DataFrame have the binary comparison methods eq, ne, lt, gt, +le, and ge whose behavior is analogous to the binary arithmetic operations described above: .. ipython:: python diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index 4652ccbf0ad34..e5add0639432d 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -698,7 +698,7 @@ DataFrame in tabular form, though it won't always fit the console width: print(baseball.iloc[-20:, :12].to_string()) -Wide DataFrames will be printed across multiple rows by +Note that wide DataFrames will be printed across multiple rows by default: .. ipython:: python @@ -845,9 +845,11 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to .. note:: - Panel, being less commonly used than Series and DataFrame, + Unfortunately Panel, being less commonly used than Series and DataFrame, has been slightly neglected feature-wise. A number of methods and options - available in DataFrame are not available in Panel. + available in DataFrame are not available in Panel. This will get worked + on, of course, in future releases. And faster if you join me in working on + the codebase. .. _dsintro.to_panel: diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 53c0b771555f8..0be60d2301b6b 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -140,7 +140,7 @@ columns: In [5]: grouped = df.groupby(get_letter_type, axis=1) -pandas Index objects support duplicate values. If a +Note that pandas Index objects support duplicate values. If a non-unique index is used as the group key in a groupby operation, all values for the same index value will be considered to be in one group and thus the output of aggregation functions will only contain unique index values: diff --git a/doc/source/io.rst b/doc/source/io.rst index f68358764a40e..5be68a93f8e3f 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -3878,7 +3878,7 @@ create a new table!) Iterator ++++++++ -You can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` +Note that you can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` to ``select`` and ``select_as_multiple`` to return an iterator on the results. The default is 50,000 rows returned in a chunk. @@ -4419,6 +4419,44 @@ Now you can import the ``DataFrame`` into R: starting point if you have stored multiple ``DataFrame`` objects to a single HDF5 file. +Backwards Compatibility +''''''''''''''''''''''' + +0.10.1 of ``HDFStore`` can read tables created in a prior version of pandas, +however query terms using the +prior (undocumented) methodology are unsupported. ``HDFStore`` will +issue a warning if you try to use a legacy-format file. You must +read in the entire file and write it out using the new format, using the +method ``copy`` to take advantage of the updates. The group attribute +``pandas_version`` contains the version information. ``copy`` takes a +number of options, please see the docstring. + + +.. ipython:: python + :suppress: + + import os + legacy_file_path = os.path.abspath('source/_static/legacy_0.10.h5') + +.. ipython:: python + :okwarning: + + # a legacy store + legacy_store = pd.HDFStore(legacy_file_path,'r') + legacy_store + + # copy (and return the new handle) + new_store = legacy_store.copy('store_new.h5') + new_store + new_store.close() + +.. ipython:: python + :suppress: + + legacy_store.close() + import os + os.remove('store_new.h5') + Performance ''''''''''' From 7b9bc62b3292eff2c009047b0f89ae58193913f2 Mon Sep 17 00:00:00 2001 From: tp Date: Fri, 1 Sep 2017 01:18:25 +0100 Subject: [PATCH 4/5] Pull out again the part about HDFStore backwards compatability --- doc/source/basics.rst | 4 ++-- doc/source/dsintro.rst | 8 +++----- doc/source/groupby.rst | 2 +- doc/source/io.rst | 2 +- 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/doc/source/basics.rst b/doc/source/basics.rst index b01ca96a611f2..35eb14eda238f 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -251,8 +251,8 @@ replace NaN with some other value using ``fillna`` if you wish). Flexible Comparisons ~~~~~~~~~~~~~~~~~~~~ -Note that Series and DataFrame have the binary comparison methods eq, ne, lt, gt, -le, and ge whose behavior is analogous to the binary +Series and DataFrame have the binary comparison methods ``eq``, ``ne``, ``lt``, ``gt``, +``le``, and ``ge`` whose behavior is analogous to the binary arithmetic operations described above: .. ipython:: python diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index e5add0639432d..4652ccbf0ad34 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -698,7 +698,7 @@ DataFrame in tabular form, though it won't always fit the console width: print(baseball.iloc[-20:, :12].to_string()) -Note that wide DataFrames will be printed across multiple rows by +Wide DataFrames will be printed across multiple rows by default: .. ipython:: python @@ -845,11 +845,9 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to .. note:: - Unfortunately Panel, being less commonly used than Series and DataFrame, + Panel, being less commonly used than Series and DataFrame, has been slightly neglected feature-wise. A number of methods and options - available in DataFrame are not available in Panel. This will get worked - on, of course, in future releases. And faster if you join me in working on - the codebase. + available in DataFrame are not available in Panel. .. _dsintro.to_panel: diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 0be60d2301b6b..53c0b771555f8 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -140,7 +140,7 @@ columns: In [5]: grouped = df.groupby(get_letter_type, axis=1) -Note that pandas Index objects support duplicate values. If a +pandas Index objects support duplicate values. If a non-unique index is used as the group key in a groupby operation, all values for the same index value will be considered to be in one group and thus the output of aggregation functions will only contain unique index values: diff --git a/doc/source/io.rst b/doc/source/io.rst index 5be68a93f8e3f..74ef6ea917ae7 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -3878,7 +3878,7 @@ create a new table!) Iterator ++++++++ -Note that you can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` +You can pass ``iterator=True`` or ``chunksize=number_in_a_chunk`` to ``select`` and ``select_as_multiple`` to return an iterator on the results. The default is 50,000 rows returned in a chunk. From 7d18e067a62678f172db3ce212b329a20b2155b4 Mon Sep 17 00:00:00 2001 From: tp Date: Fri, 1 Sep 2017 21:19:15 +0100 Subject: [PATCH 5/5] Improve note on using inf as nan in calculations --- doc/source/missing_data.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index 3ad08f6819642..64a321d67a825 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -67,8 +67,8 @@ arise and we wish to also consider that "missing" or "not available" or "NA". .. note:: - If you want to consider ``inf`` and ``-inf`` - to be "NA" in computations, you can use the ``mode.use_inf_as_na`` option to archieve it. + If you want to consider ``inf`` and ``-inf`` to be "NA" in computations, + you can set ``pandas.options.mode.use_inf_as_na = True``. .. _missing.isna: