-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrames should have a name attribute. #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IMO it would make sense only if one would export to Excel worksheets, in such a case it would be nice to have. |
It can also be used to set default path in |
Could also be integrated into |
I'd upvote this one. I'm using it to auto-title plots and think it would certainly be a nice feature. |
I found uses for it too, however the name (as of v0.90) doesn't survive pickling, which if it did, would be useful to have working (my workaround is a bit of a fudge). To see the problem, try the following: import pandas as pd
df = pd.DataFrame( data=np.ones([6,6]) )
df.name = 'Ones'
df.save('ones.df')
df2 = pd.load('ones.df')
print df2.name I'd love to be able to dive in and contribute a fix, but I'm still not so well-versed in the library and many aspects of Python. |
It's not a simple addition (you have to worry about preserving metadata through computations), but it would be nice. We'll probably look into it in the somewhat near future |
Hi Paul, I have written a couple functions that will let you transfer all the custom https://github.com/hugadams/pyuvvis/tree/master/pyuvvis/pandas_utils In particular, if you save dataframeserial.py, it will save and load your df=DataFrame() transfer_attr(df, df2) I agree that persistent custom attributes would be a key development in the On Mon, Dec 3, 2012 at 1:01 PM, Wes McKinney [email protected]:
|
Now that I have some time, I wanted to followup with this. DataFrame, IMO, should have a .name attribute, and df.columns also should have a .name attribute and so should df.index, or at least I've found this useful in my work. In any case, I think persistent attributes, and to a lesser extent, instance methods, would be an extremely important addition to pandas. Here's my reasoning: Everybody that uses pandas for analysis outside of the scope of timeseries will eventually benefit from customizing/subclassing a DataFrame at some point. Usually, the dataframe is the ideal object for storing the numerical data, but there is also pertinent information that could go along with it to really customize the object. For example, a dataframe becomes the ideal choice for a spectroscopy experiment if one can store an extra array, the spectral baseline, outside of the dataframe. Additionally, experimental metadata ought to be stored. This is so easily done by adding attributes to the dataframe, that it almost begs to be the canonical way to handle spectral data. The functions I wrote in the above link use a crude method to transfer arbitrary attributes between dataframes. In short, it first examines an empty DataFrame's attributes, and compares these with a list of attributes from the user's dataframe. Any differences are then transferred to the new dataframe. As a hack until a better solution presents itself, dataframe returns could call my transfer_attr() function before returning a new DataFrame. I wouldn't know how to integrate this fully into pandas otherwise. I know this is low on the priority list, but I really do think that persistent custom attributes would be a big step forward, and not just an appeasement for corner case users. |
Interesting. I'd like to add a couple of notes:
|
y-p, I would be fine with a metadatadict, or whatever is the most elegant solution to the problem. The reason that I like adding attributes is for access. Something like df.name is easier for people to keep up with than df.metadata['name']; however, if you gave the metadata dict attribute access, then df.metadict.name is also pretty simple. Am I understanding you correctly? Whatever solution ends up being the most simple to implement, would be useful. I agree that the name issue is separate. If .name is too baked in as you say, then sure, don't include it. But if the pandas Index object also had a way to persist attributes, or a persistent metadata dict, then one could just slap names or whatever attributes they want, onto these object as well. |
I like the idea of providing attribute access under a predefined attribute rather then |
Whatever is best for pandas would be fine with me. If the import gets too On Fri, Dec 7, 2012 at 8:10 PM, y-p [email protected] wrote:
|
new custom metadata issue at #2485. |
Anyone working on adding the name attribute? And where to look for more information on a possible tags property? |
You can try using the metadataframe class if you want. Let me kn ow and ill
|
Alternatively you can create a composite class that stores the name
|
A df name attribute would be useful when slicing panels down to dataframes, parallel to the case where a df column name becomes a series name when sliced. In theory, this should generalize to any number of dimensions. |
+1 on this. I was wanting it just a few days ago. |
updated title. we'll see when the rest can follow. |
Looking forward to this. Related: http://stackoverflow.com/questions/11672403/adding-my-own-description-attribute-to-a-pandas-dataframe |
@hugadams btw - columns and index now get name attributes (if you have a hierarchical index, it's called names...) not sure if that covers what you were looking for in terms of columns and index. |
@jtratner I added support for this by just defining _prop_attributes (see Series) not complete though as Series mostly uses the original old method and need a better method to resolve name conflicts and such (eg when u add frames with different names what happens), same issue in Series though so a bit of a project but all the support is there for this |
@jreback , does doing this fit in naturally into the NDFrame unification deal? |
technically this is easy (just add to |
series names aren't implemented as metadata, and series now derive from NDFrame. |
@y-p ahh but they are! (well they are in the |
Well... then they didn't used to be. nevermind. let it sit for another 18 months or so. |
Another use of a name attribute would be for GUI's dealing with dataframes. I have one such program which allows a user to load many csv files and plot columns from them. The backend uses dataframes to load and store the CSV data. |
in ipython notebook and similar REPL, it would make sense to display dataframe name or more generally custom metadata, like in Excel toolbar (count, min, max, average, sum, nans, numerical count). |
When saving results of an analysis, resulting in several different outputs, it would be so nice to automatically save and name your output:
|
Ohhh. I see now from StackExchange that I can do something like this:
|
@summer Rae, great finding! Thanks for sharing. Also computed properties are contained in .describe() On Tue, Aug 18, 2015, 6:36 PM Summer Rae [email protected] wrote:
|
@summerela wonderful finding! |
Adding a |
I realize that this is a year out of date, but I'd like to pitch in a use case where having a name for a data frame can be really useful. When performing multi-block analysis (i.e. multi-block partial least squares) in another package (like statsmodels), it would be awesome if we could specify R style formulas via patsy and run this sort of analysis as something like as follows
where |
@mortonjt For this sort of multi-dimensional data analysis, I would consider using xarray, which does already support a |
I totally forgot about xarray ... Thanks @shoyer! |
Hey, since the pandas API sometimes provides DataFrames with df = pd.DataFrame({'part_no':[1,1,2,2], 'system':['fax machine', None, 'fax machine', 'truck'])
gb = df.groupby('part_no')
gb.apply(validate_individual_parts)
def validate_individual_parts(df: pd.DataFrame) -> None:
if (df['system'].unique()) > 1:
logging.warning(f'Unable to determine system for part {df.name}) I'm not sure what the engineering principle at play here is, but it seems reasonable to expect all instances of a DataFrame produced by the pandas API to have the same set of attributes. In the above context, an |
these are support via the |
was: should DataFrames have a name attribute?
@y-p
The text was updated successfully, but these errors were encountered: