Skip to content

Conversation

bjonen
Copy link
Contributor

@bjonen bjonen commented Apr 16, 2014

This adds a info_verbose to the options. There's a small section in faq and basic introduction. Entry in v0.14 is still missing.

Closes #6568

if self._info_repr():
self.info(buf=buf)
info_verbose = get_option("display.info_verbose")
if self._info_repr() and info_verbose:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and the next section: just pass verbose=info_verbose rather than having an if/then

@jorisvandenbossche
Copy link
Member

@jreback I know you were in favor of making it a seperate option instead of adding 'info_short' to display.large_repr, but when looking at it now: if I see the option named info_verbose to be True or False, I expect this sets the default for the info method, while this actually only sets behaviour of info when this is used in the large dataframe repr. So I find this a bit confusing.

  • what it actually means is large_repr_info_verbose, but this is then a bit a verbose name .. :-)
  • it could also set the default for df.info(verbose=..) itself? (But I don't know if this is wanted)
  • maybe nonetheless choose for display.large_repr='info_short' or info_concise? Or do you think this is more confusing?

@bjonen @jreback What do you think?

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

this sets the default for df.info(...)

I agree it's only when the info repr is triggered in the first place

@bjonen if we change this option to really be a subset of large_repr

iow have 3 options: False, True/verbose (verbose=False), concise (verbose=False)

that would work yes?

though I think the default should be concise (so maybe have to fiddle with this a bit to be backward compat)

@bjonen
Copy link
Contributor Author

bjonen commented Apr 16, 2014

Thanks for your comments.

I think it's a good idea to directly control df.info(verbose=..). Then it makes sense to have a separate option info_verbose, as it is independent of display.large_repr.

@bjonen
Copy link
Contributor Author

bjonen commented Apr 16, 2014

3 options for large_repr works too for me. Why do you prefer to leave the default for df.info unchanged?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 16, 2014

@jreback @jorisvandenbossche
Which solution should we go with?

@jorisvandenbossche
Copy link
Member

So the options would be:

  • having a info_verbose (True/False) option which would set the default for df.info(verbose=..) itself
  • adding an option to large_repr ('truncate' (default), 'info', and newly 'info_short' or 'info_concise') that only sets the default for info when called in the repr of df

Personally I don't really have a preference (I won't use any of both options)

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

hmm I think I like the separate option

with a default of False (which is an API change)

@jorisvandenbossche
Copy link
Member

Why a default of False? Is there a reason to change the behaviour of df.info()?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

How do you think we should change the default on df.info?

  • We could have a wrapper that sets verbose and passes it to info but then we lose the ability to do 'df.info?' in ipython to get the default arguments.
  • Alternatively we could have something like below and somehow trigger DataFrame.info = DataFrame.info_non_verbose when the option 'info_verbose' changes.

Any thoughts?

    def info_verbose(self, verbose=True, buf=None, max_cols=None):
        self._info(verbose=verbose,buf=buf,max_cols=max_cols)

    def info_non_verbose(self, verbose=False, buf=None, max_cols=None):
        self._info(verbose=verbose,buf=buf,max_cols=max_cols)        

    def _info(self, verbose=True, buf=None, max_cols=None):

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

I am pretty sure the default was False in 0.12
I think got changed inadvertently in 0.13/0.13.1

@bjonen can u see if that was the case?

@jorisvandenbossche this doesn't need to be complicated
df.info() works as is and u can pass a parameter
all the option will do is to set the parameter

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

Yes it did change in 0.13 but it seems it was intentional: #4886

"df.info() works as is and u can pass a parameter
all the option will do is to set the parameter"

If I understand correctly, this is the version currently implemented in the PR. That means the option value (True/False) will only have an impact when a df is printed and not when one calls df.info() because the default is hard coded.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

ok the default wasn't meant to change

change the signature to

df.info(verbose=None)

then handle a passed true/false as an override
if none then use the info_verbose option
which I think should default to False

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

Ok sounds good!

@bjonen
Copy link
Contributor Author

bjonen commented Apr 17, 2014

Ok so

I'll adapt the PR.

@jorisvandenbossche
Copy link
Member

@jreback don't you have the max_info_cols option for that? To decide when the short and when the long summary is shown? Wouldn't that conflict with an option to set verbose to True or False?

max_info_columns is used in DataFrame.info method to decide if
        per column information will be printed.

But maybe we are just misinterpreting each other words, as I think the default for verbose in df.info() is already a long time True, see eg docs of 0.7: http://pandas.pydata.org/pandas-docs/version/0.7.0/generated/pandas.DataFrame.info.html

@jreback
Copy link
Contributor

jreback commented Apr 17, 2014

hmm don't know

@bjonen can u investigate this?

@jreback jreback added this to the 0.14.0 milestone Apr 21, 2014
@@ -1666,3 +1666,35 @@ columns of DataFrame objects are shown by default. If ``max_columns`` is set to
0 (the default, in fact), the library will attempt to fit the DataFrame's
string representation into the current terminal width, and defaulting to the
summary view otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this same thing (you can copy-paste) to v0.14.0 as its a bit of a change, want to inform users, create a new sub-section (e.g. use ---- under the heading), put after the plotting sub-section (include a pointer to the basics section a ':ref:')

.. ipython:: python

with option_context("display.large_repr",'info'):
print df_lrge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use print(df_lrge) (for py3 compat in doc building) instead of print

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

minor edit

can you post at the top of the issue what the 3 cases are (e.g. what the docs are going to show)
(an ipython picture, png)would be even better actually and you could include this in-line

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@jorisvandenbossche ?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

@jorisvandenbossche max_info_columns is currently doing something different. Maybe we can adjust it so that it can play the role of info_verbose. The current behavior doesn't safe any space. Also the default of 100 seems very high to me.

max_columns doesn't seem to have an effect at all under large_repr = 'info'.

In [12]: import pandas as pd

In [13]: df = pd.DataFrame(columns=['a','b','c'],index=pd.DatetimeIndex(start='19900101',end='20000101',freq='BM'))

In [15]: pd.options.display.large_repr = 'info'

In [16]: df
Out[16]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a 0 non-null object
b 0 non-null object
c 0 non-null object
dtypes: object(3)

In [17]: pd.options.display.max_columns = 1

In [18]: df
Out[18]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a 0 non-null object
b 0 non-null object
c 0 non-null object
dtypes: object(3)

In [19]: pd.options.display.max_info_columns = 1

In [20]: df
Out[20]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Data columns (total 3 columns):
a object
b object
c object
dtypes: object(3)

In [21]: df.info(False)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1990-01-31 00:00:00 to 1999-12-31 00:00:00
Freq: BM
Columns: 3 entries, a to c
dtypes: object(3)

@jreback Looking at previous commits in git:

    git grep 'def info\(self, verbose=True.*\):' $(git rev-list --all) 

it seems that the default has been True up to now.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@bjonen ok on the default then (I think I personally set it to False, but that is fine then)

you need to use a frame with > max_info_columns (e.g. 101) to get an effect

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

I reset the default below the number of columns (3) in the df (see previous post):

pd.options.display.max_info_columns = 1

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@bjonen hmm.. see if you can figure out from the tests what it is supposed to do. All of the options have complex interactions. If its a 'bug' would rather fix than add a new option if we can (e.g. you are suggesting that if you have columns > max_info_columns then we basically switch info_verbose to False (instead of actually having an option), right?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

Yes, exactly. I'll look into it and let you guys know.

@jorisvandenbossche
Copy link
Member

@bjonen That max_columns has no effect when large_repr='info' is normal, as this parameter defines how many columns are shown in the default display, not in the info display (for that is max_info_columns).

The strange behaviour of max_info_cols is a bug I think, a regression in current master. As with 0.13 I get:

In [1]: pd.__version__
Out[1]: '0.13.0'

In [5]: df = pd.DataFrame(np.random.randn(5,5))
In [6]: df
Out[6]:
          0         1         2         3         4
0  0.708876 -0.179273  1.367976 -0.929688 -1.138946
1  1.047154  1.049302 -0.248178 -0.957677  1.879843
2 -0.523272 -2.013742  2.064032 -1.389822  1.394960
3  0.224508  1.032544 -1.312425  0.123956  0.144831
4 -1.691660  0.952837  1.380545 -1.279794  1.026131

In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0    5  non-null values
1    5  non-null values
2    5  non-null values
3    5  non-null values
4    5  non-null values
dtypes: float64(5)

In [8]: pd.options.display.max_info_columns = 4

In [9]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [10]:

which is much more logical and in line with the explanation (max_info_columns is used in DataFrame.info method to decide if per column information will be printed.)

But that seems like a seperate issue.

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

It seems the behavior was introduced in 0.13.1
Sticking with your example:

In [22]: pd.version
Out[22]: '0.13.1'

In [27]: pd.options.display.max_info_columns = 100

In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0 5 non-null float64
1 5 non-null float64
2 5 non-null float64
3 5 non-null float64
4 5 non-null float64
dtypes: float64(5)
In [29]: pd.options.display.max_info_columns = 1

In [30]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
0 float64
1 float64
2 float64
3 float64
4 float64
dtypes: float64(5)

The explanation for max_info_rows reads:
"df.info() will usually show null-counts for each column. For large frames this can be quite slow. max_info_rows and max_info_cols limit this null check only to frames with smaller dimensions then specified."

So the max_info options as they are implemented right now are concerned with improving display performance and not so much display style.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

ahh...I do recal this a bit, @y-p put this in IIRC

to not do the non-null check if you have a very large frame that would be displayed in a summary anyhow

but maybe it introduced a bug (as @jorisvandenbossche describes)

can you simply this (w/o creating more havoc!)

@jorisvandenbossche
Copy link
Member

I just opened a seperate issue (#6939), as I thought this was seperate from this discussion? So can you maybe repeat that overthere?
(But of course, if it is not a regression, it has implications for this issue)

@jorisvandenbossche
Copy link
Member

it was indeed a pr of @y-p : #5974

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@bjonen ok....so pls change this PR to close #6939 in addition (its the same fix)

@bjonen
Copy link
Contributor Author

bjonen commented Apr 23, 2014

Ok will do.

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

@bjonen any luck with this ?

@bjonen
Copy link
Contributor Author

bjonen commented Apr 29, 2014

@jreback I'm currently working on #5603 (comment) .

@jreback
Copy link
Contributor

jreback commented May 8, 2014

@bjonen coming along?

@bjonen
Copy link
Contributor Author

bjonen commented May 8, 2014

I'll submit a PR tonight so you see where I am at.

@bjonen
Copy link
Contributor Author

bjonen commented May 9, 2014

I pushed the current state to https://github.com/bjonen/pandas/commits/adj_trunc. The truncate represantation is generally working. Feel free to check out the displaying of large dfs.

Still there are some tests (mainly in test_frame) not passing. Looking into it...

@jreback
Copy link
Contributor

jreback commented May 14, 2014

closing in favor or #7130

@jreback jreback closed this May 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to set large_repr to info(verbose=False) missing
3 participants