You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be very nice if Pandas could read metadata like this. There is a section with an example of CSV/TSV meader metadata that might make a good starting point. The full recommendation seems somewhat vague, but perhaps that means that Pandas could help to define some more specific standards.
Perhaps a YAML header behind # characters, where some known variable names (e.g. datatype) are captured for use in reading the rest of the file, where remaining unused YAML data is added to a df.metadata dictionary?
The text was updated successfully, but these errors were encountered:
Thanks - I wasn't even aware of this. I think this is an interesting idea and would agree that the datatype annotations seems like a logical starting point.
PRs are always welcome if you have an idea on how to implement
Probably not very at all, but it's a recommended spec, CSV metadata management is a real PITA, and this seems to solve it. Getting it added to the most popular CSV manipulation library around would really help make it more common, I reckon.
There are also potential side-benefits, for example the #datatype declaration would allow immediate inference of datatypes without having to scan the first 100 lines of the CSV.
Related to #2485
The W3C Tabular Data Model recommendation that include arbitrary text data, as well as column-specific metadata, such as column data types.
It would be very nice if Pandas could read metadata like this. There is a section with an example of CSV/TSV meader metadata that might make a good starting point. The full recommendation seems somewhat vague, but perhaps that means that Pandas could help to define some more specific standards.
Perhaps a YAML header behind
#
characters, where some known variable names (e.g.datatype
) are captured for use in reading the rest of the file, where remaining unused YAML data is added to adf.metadata
dictionary?The text was updated successfully, but these errors were encountered: