Skip to content

Group and Array repr performance #132

Closed
@alimanfoo

Description

@alimanfoo

Generating a repr for a group can currently be slow, for the quirky reason that the repr needs to decide if the chunk_store is different to the store in order to generate an extra line showing the chunk store if different, and this triggers a complete comparison of store contents if the store is a shelve. E.g.:

In [37]: %lprun -f zarr.hierarchy.Group.__repr__ repr(g)
Timer unit: 1e-06 s

Total time: 5.60861 s
File: /home/aliman/miniconda3/envs/biipy_20170126_py35/lib/python3.5/site-packages/zarr/hierarchy.py
Function: __repr__ at line 194

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   194                                               def __repr__(self):
   195                                           
   196                                                   # main line
   197         1           11     11.0      0.0          r = '%s(' % type(self).__name__
   198         1           28     28.0      0.0          r += self.name + ', '
   199         1       276958 276958.0      4.9          r += str(len(self))
   200         1            2      2.0      0.0          r += ')'
   201                                           
   202                                                   # members
   203         1       244629 244629.0      4.4          array_keys = list(self.array_keys())
   204         1            2      2.0      0.0          if array_keys:
   205         1            1      1.0      0.0              arrays_line = '\n  arrays: %s; %s' % \
   206         1            8      8.0      0.0                  (len(array_keys), ', '.join(array_keys))
   207         1            1      1.0      0.0              if len(arrays_line) > 80:
   208         1            1      1.0      0.0                  arrays_line = arrays_line[:77] + '...'
   209         1            1      1.0      0.0              r += arrays_line
   210         1       240623 240623.0      4.3          group_keys = list(self.group_keys())
   211         1            3      3.0      0.0          if group_keys:
   212                                                       groups_line = '\n  groups: %s; %s' % \
   213                                                           (len(group_keys), ', '.join(group_keys))
   214                                                       if len(groups_line) > 80:
   215                                                           groups_line = groups_line[:77] + '...'
   216                                                       r += groups_line
   217                                           
   218                                                   # storage and synchronizer classes
   219         1           10     10.0      0.0          r += '\n  store: %s' % type(self.store).__name__
   220         1      4846320 4846320.0     86.4          if self.store != self.chunk_store:
   221                                                       r += '; chunk_store: %s' % type(self.chunk_store).__name__
   222         1           13     13.0      0.0          if self.synchronizer is not None:
   223                                                       r += '; synchronizer: %s' % type(self.synchronizer).__name__
   224                                           
   225         1            1      1.0      0.0          return r

There should be a way to avoid this comparison, at least in the case where chunk_store is None when the Group is instantiated, because then it is known there is no separate chunk store.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePotential issues with Zarr performance (I/O, memory, etc.)release notes doneAutomatically applied to PRs which have release notes.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions