Skip to content

Group and Array repr performance #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alimanfoo opened this issue Feb 26, 2017 · 2 comments · Fixed by #148
Closed

Group and Array repr performance #132

alimanfoo opened this issue Feb 26, 2017 · 2 comments · Fixed by #148
Labels
performance Potential issues with Zarr performance (I/O, memory, etc.) release notes done Automatically applied to PRs which have release notes.
Milestone

Comments

@alimanfoo
Copy link
Member

Generating a repr for a group can currently be slow, for the quirky reason that the repr needs to decide if the chunk_store is different to the store in order to generate an extra line showing the chunk store if different, and this triggers a complete comparison of store contents if the store is a shelve. E.g.:

In [37]: %lprun -f zarr.hierarchy.Group.__repr__ repr(g)
Timer unit: 1e-06 s

Total time: 5.60861 s
File: /home/aliman/miniconda3/envs/biipy_20170126_py35/lib/python3.5/site-packages/zarr/hierarchy.py
Function: __repr__ at line 194

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   194                                               def __repr__(self):
   195                                           
   196                                                   # main line
   197         1           11     11.0      0.0          r = '%s(' % type(self).__name__
   198         1           28     28.0      0.0          r += self.name + ', '
   199         1       276958 276958.0      4.9          r += str(len(self))
   200         1            2      2.0      0.0          r += ')'
   201                                           
   202                                                   # members
   203         1       244629 244629.0      4.4          array_keys = list(self.array_keys())
   204         1            2      2.0      0.0          if array_keys:
   205         1            1      1.0      0.0              arrays_line = '\n  arrays: %s; %s' % \
   206         1            8      8.0      0.0                  (len(array_keys), ', '.join(array_keys))
   207         1            1      1.0      0.0              if len(arrays_line) > 80:
   208         1            1      1.0      0.0                  arrays_line = arrays_line[:77] + '...'
   209         1            1      1.0      0.0              r += arrays_line
   210         1       240623 240623.0      4.3          group_keys = list(self.group_keys())
   211         1            3      3.0      0.0          if group_keys:
   212                                                       groups_line = '\n  groups: %s; %s' % \
   213                                                           (len(group_keys), ', '.join(group_keys))
   214                                                       if len(groups_line) > 80:
   215                                                           groups_line = groups_line[:77] + '...'
   216                                                       r += groups_line
   217                                           
   218                                                   # storage and synchronizer classes
   219         1           10     10.0      0.0          r += '\n  store: %s' % type(self.store).__name__
   220         1      4846320 4846320.0     86.4          if self.store != self.chunk_store:
   221                                                       r += '; chunk_store: %s' % type(self.chunk_store).__name__
   222         1           13     13.0      0.0          if self.synchronizer is not None:
   223                                                       r += '; synchronizer: %s' % type(self.synchronizer).__name__
   224                                           
   225         1            1      1.0      0.0          return r

There should be a way to avoid this comparison, at least in the case where chunk_store is None when the Group is instantiated, because then it is known there is no separate chunk store.

@alimanfoo alimanfoo added this to the v2.2 milestone Feb 26, 2017
@alimanfoo alimanfoo changed the title Group repr performance Group and Array repr performance Feb 26, 2017
@alimanfoo
Copy link
Member Author

Also there are two pain points for Array.__repr__:

Timer unit: 1e-06 s

Total time: 9.77442 s
File: /home/aliman/miniconda3/envs/biipy_20170126_py35/lib/python3.5/site-packages/zarr/core.py
Function: _repr_nosync at line 816

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   816                                               def _repr_nosync(self):
   817                                           
   818                                                   # main line
   819         1           40     40.0      0.0          r = '%s(' % type(self).__name__
   820         1           19     19.0      0.0          if self.name:
   821         1           10     10.0      0.0              r += '%s, ' % self.name
   822         1           13     13.0      0.0          r += '%s, ' % str(self._shape)
   823         1            7      7.0      0.0          r += '%s, ' % str(self._dtype)
   824         1            6      6.0      0.0          r += 'chunks=%s, ' % str(self._chunks)
   825         1            4      4.0      0.0          r += 'order=%s' % self._order
   826         1            2      2.0      0.0          r += ')'
   827                                           
   828                                                   # storage size info
   829         1           42     42.0      0.0          r += '\n  nbytes: %s' % human_readable_size(self._nbytes)
   830         1      4854174 4854174.0     49.7          if self.nbytes_stored > 0:
   831                                                       r += '; nbytes_stored: %s' % human_readable_size(
   832                                                           self.nbytes_stored)
   833                                                       r += '; ratio: %.1f' % (self._nbytes / self.nbytes_stored)
   834         1       130277 130277.0      1.3          r += '; initialized: %s/%s' % (self.nchunks_initialized,
   835         1           33     33.0      0.0                                         self._nchunks)
   836                                           
   837                                                   # filters
   838         1            2      2.0      0.0          if self._filters:
   839                                                       # first line
   840                                                       r += '\n  filters: %r' % self._filters[0]
   841                                                       # subsequent lines
   842                                                       for f in self._filters[1:]:
   843                                                           r += '\n           %r' % f
   844                                           
   845                                                   # compressor
   846         1            1      1.0      0.0          if self._compressor:
   847         1           10     10.0      0.0              r += '\n  compressor: %r' % self._compressor
   848                                           
   849                                                   # storage and synchronizer classes
   850         1            2      2.0      0.0          r += '\n  store: %s' % type(self._store).__name__
   851         1      4789776 4789776.0     49.0          if self._store != self._chunk_store:
   852                                                       r += '; chunk_store: %s' % type(self._chunk_store).__name__
   853         1            4      4.0      0.0          if self._synchronizer is not None:
   854                                                       r += '; synchronizer: %s' % type(self._synchronizer).__name__
   855                                           

@alimanfoo
Copy link
Member Author

Should be straightforward to fix, by setting _chunk_store attribute as None.

@alimanfoo alimanfoo mentioned this issue Apr 6, 2017
@alimanfoo alimanfoo mentioned this issue Apr 26, 2017
@alimanfoo alimanfoo added performance Potential issues with Zarr performance (I/O, memory, etc.) release notes done Automatically applied to PRs which have release notes. labels Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential issues with Zarr performance (I/O, memory, etc.) release notes done Automatically applied to PRs which have release notes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant