Skip to content

Wrap "Dimensions" onto multiple lines in xarray.Dataset repr? #4081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue May 19, 2020 · 4 comments
Closed

Wrap "Dimensions" onto multiple lines in xarray.Dataset repr? #4081

shoyer opened this issue May 19, 2020 · 4 comments

Comments

@shoyer
Copy link
Member

shoyer commented May 19, 2020

Here's an example dataset of a large dataset from @alimanfoo:
https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764

<xarray.Dataset>
Dimensions:                         (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865)
Coordinates:
    samples/ID                      (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
    variants/CHROM                  (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
    variants/POS                    (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
    variants/ABHet                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/ABHom                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/AC                     (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
    variants/AF                     (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...

I know similarly large datasets with lots of dimensions come up in other contexts as well, e.g., with geophysical model output.

That's a very long first line! This would be easier to read as:

<xarray.Dataset>
Dimensions:                         (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3,
                                     __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2,
                                     samples: 1142, variants: 21442865)
Coordinates:
    samples/ID                      (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
    variants/CHROM                  (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
    variants/POS                    (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
    variants/ABHet                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/ABHom                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/AC                     (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
    variants/AF                     (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...

or maybe:

<xarray.Dataset>
Dimensions:
    __variants/BaseCounts_dim1: 4
    __variants/MLEAC_dim1: 3
    __variants/MLEAF_dim1: 3
    alt_alleles: 3
    ploidy: 2
    samples: 1142
    variants: 21442865
Coordinates:
    samples/ID                      (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
    variants/CHROM                  (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
    variants/POS                    (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
    variants/ABHet                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/ABHom                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/AC                     (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
    variants/AF                     (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...

Dimensions without coordinates could probably use some wrapping, too.

@alimanfoo
Copy link
Contributor

Thanks @shoyer for raising this, would be nice to wrap the dimensions, I'd vote for one per line.

@max-sixty
Copy link
Collaborator

Agree with @alimanfoo !

Maybe (eventually, second priority) with the dim lengths aligned. Or do we end up with a table-within-a-table then?

@stale
Copy link

stale bot commented Apr 28, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 28, 2022
@dcherian dcherian removed the stale label Apr 29, 2022
@Illviljan
Copy link
Contributor

Done in #5662.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants