Reading/writing of W3C-style embeded metadata in CSV, TSV files #25379

nedclimaterisk · 2019-02-20T01:53:19Z

Related to #2485

The W3C Tabular Data Model recommendation that include arbitrary text data, as well as column-specific metadata, such as column data types.

It would be very nice if Pandas could read metadata like this. There is a section with an example of CSV/TSV meader metadata that might make a good starting point. The full recommendation seems somewhat vague, but perhaps that means that Pandas could help to define some more specific standards.

Perhaps a YAML header behind # characters, where some known variable names (e.g. datatype) are captured for use in reading the rest of the file, where remaining unused YAML data is added to a df.metadata dictionary?

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-02-20T04:44:12Z

Thanks - I wasn't even aware of this. I think this is an interesting idea and would agree that the datatype annotations seems like a logical starting point.

PRs are always welcome if you have an idea on how to implement

jbrockmendel · 2019-12-13T09:14:04Z

how common is this format in the wild?

naught101 · 2019-12-15T23:39:20Z

Probably not very at all, but it's a recommended spec, CSV metadata management is a real PITA, and this seems to solve it. Getting it added to the most popular CSV manipulation library around would really help make it more common, I reckon.

There are also potential side-benefits, for example the #datatype declaration would allow immediate inference of datatypes without having to scan the first 100 lines of the CSV.

WillAyd added Enhancement IO CSV read_csv, to_csv labels Feb 20, 2019

WillAyd added this to the Contributions Welcome milestone Feb 20, 2019

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading/writing of W3C-style embeded metadata in CSV, TSV files #25379

Reading/writing of W3C-style embeded metadata in CSV, TSV files #25379

nedclimaterisk commented Feb 20, 2019

WillAyd commented Feb 20, 2019

jbrockmendel commented Dec 13, 2019

naught101 commented Dec 15, 2019

Reading/writing of W3C-style embeded metadata in CSV, TSV files #25379

Reading/writing of W3C-style embeded metadata in CSV, TSV files #25379

Comments

nedclimaterisk commented Feb 20, 2019

WillAyd commented Feb 20, 2019

jbrockmendel commented Dec 13, 2019

naught101 commented Dec 15, 2019