Refactor to use the dict representation of tree sequence #39

jeromekelleher · 2021-04-08T07:49:44Z

Currently we're explicitly writing down the columns that we compress, leading to loss of data when we have format updates in tskit. We should use the dict representation of the tables instead, and automatically create Column objects for the data with good default compression options. We can make some special cases for particular columns if we like, but we should by default always store all the data that comes from the input tree sequence.

related to #35

brianzhang01 · 2021-04-08T12:13:26Z

If the format of written tsz files becomes different, it would be good to offer scripts to update old tsz files to this new format.

jeromekelleher · 2021-04-08T12:56:14Z

We'll try to keep the same format if possible @brianzhang01, but if there's a different format needed we'll make sure the old ones are also transparently supported (so there's no need for format upgrades).

jeromekelleher mentioned this issue Apr 8, 2021

Compress top-level metadata #35

Merged

jeromekelleher added this to the Version 0.2.0 milestone Apr 8, 2021

benjeffery mentioned this issue Apr 21, 2021

Use dict representation for compress and decompress #42

Merged

mergify bot closed this as completed in #42 Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor to use the dict representation of tree sequence #39

Refactor to use the dict representation of tree sequence #39

jeromekelleher commented Apr 8, 2021

brianzhang01 commented Apr 8, 2021

Uh oh!

jeromekelleher commented Apr 8, 2021

Uh oh!

Refactor to use the dict representation of tree sequence #39

Refactor to use the dict representation of tree sequence #39

Comments

jeromekelleher commented Apr 8, 2021

brianzhang01 commented Apr 8, 2021

Uh oh!

jeromekelleher commented Apr 8, 2021

Uh oh!