Skip to content

API: cut interval formatting #8595

Closed
Closed
@fancychildren

Description

@fancychildren

it would be nice to have number in front of all labels. put number like 00, 01, 02 in front of labels so that it would order appropriately.

Lib\site-packages\pandas\tools\tile.py

def _format_levels(bins, prec, right=True,
                   include_lowest=False):
    fmt = lambda v: _format_label(v, precision=prec)
    cnter=0
    if right:
        levels = []
        for a, b in zip(bins, bins[1:]):
            fa, fb = fmt(a), fmt(b)

            if a != b and fa == fb:
                raise ValueError('precision too low')

            formatted = '%02d: (%s, %s]' % (cnter, fa, fb)
            cnter=cnter +1
            levels.append(formatted)

        if include_lowest:
            levels[0] = '[' + levels[0][1:]
    else:
        levels = ['[%s, %s)' % (fmt(a), fmt(b))
                  for a, b in zip(bins, bins[1:])]

    return levels

Activity

jreback

jreback commented on Oct 21, 2014

@jreback
Contributor

this is going to be turned into a Categorical, so ordering will happen automatically. interested to do for 0.15.1?

added this to the 0.15.1 milestone on Oct 21, 2014
jreback

jreback commented on Oct 21, 2014

@jreback
Contributor

@fancychildren can you post a full example (that doesn't order correctly)

fancychildren

fancychildren commented on Oct 21, 2014

@fancychildren
Author

here is the example. after i create the cuts and apply summary to the dataframe, the currLoanSize bucket is ordered as if it's a string, instead of a number of the lower boundary.

image

after i tweak the code, it appears like this
image

not sure if i am not using it correctly. but i would be nice to be able to order by the value of lower boundary, while keeping the label.

jreback

jreback commented on Oct 21, 2014

@jreback
Contributor

can you do a programatic example, e.g. df = DataFrame....... eventually we'll turn this into a test

fancychildren

fancychildren commented on Oct 21, 2014

@fancychildren
Author
import StringIO
from pandas import *
import numpy as np


data = StringIO.StringIO('''upb coupon
0.00    3.00
25000.00    3.00
50000.00    3.00
75000.00    3.00
100000.00   3.00
125000.00   3.00
150000.00   3.00
175000.00   3.00
200000.00   3.00
225000.00   3.00
250000.00   3.00
275000.00   3.00
300000.00   3.00
325000.00   3.00
350000.00   3.00
375000.00   3.00
400000.00   3.00
425000.00   3.00
450000.00   3.00
475000.00   3.00
500000.00   3.00
525000.00   3.00
550000.00   3.00
575000.00   3.00
600000.00   3.00
625000.00   3.00
650000.00   3.00
675000.00   3.00
700000.00   3.00
725000.00   3.00
750000.00   3.00
775000.00   3.00
800000.00   3.00
825000.00   3.00
850000.00   3.00
875000.00   3.00
900000.00   3.00
925000.00   3.00
950000.00   3.00
975000.00   3.00
1000000.00  3.00
''')

df = read_csv(data, sep='\t')

df['currLoanSize'] = cut(df['upb'], bins=[0,50000,100000,200000,400000,9999999])
df['count_pct'] = 1.0/len(df['coupon'])

def f_summary(group):
    return Series({'counts': len(group['coupon']),
                   'count%': np.sum(group['count_pct']),
                   },
                  index = ['counts', 'count%']
                  )

print df.groupby('currLoanSize').apply(f_summary)

image

modified the milestones: 0.16.0, 0.15.2 on Nov 29, 2014
changed the title [-]pandas\tools\title.py[/-] [+]API: cut interval formatting[/+] on Nov 29, 2014
removed this from the 0.16.0 milestone on Mar 6, 2015

10 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCategoricalCategorical Data TypeIntervalInterval data typeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @wesm@jreback@shoyer@8one6@fancychildren

        Issue actions

          API: cut interval formatting · Issue #8595 · pandas-dev/pandas