Skip to content

Problems with zarr 2.11.0 #628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
frezaei98 opened this issue Feb 10, 2022 · 20 comments · Fixed by #634
Closed

Problems with zarr 2.11.0 #628

frezaei98 opened this issue Feb 10, 2022 · 20 comments · Fixed by #634
Labels

Comments

@frezaei98
Copy link

From Tuesday, this code "inferred_ts = tsinfer.infer(sample_data)" has not run suddenly. I am facing this error " TypeError: 'int' object is not a mapping". What can I do to fix this error?

@jeromekelleher
Copy link
Member

Hi @frezaei98 👋, thanks for the bug report.

Can you tell us what version you're using (is it git or from conda or pip)? A minimal working example would be ideal, with a small sample data file.

@hyanwong
Copy link
Member

hyanwong commented Feb 10, 2022

That is strange. Nothing has changed in the code. Do you know what line number is giving the error. Could you post a simple test case (with data) that reproduces the error for you?

(snap, Jerome, sorry!)

@frezaei98
Copy link
Author

Can you tell us what version you're using (is it git or from conda or pip)? A minimal working example would be ideal, with a small sample data file.

Hi,
Thank you very much for your respond.
I am using the conda. But I also checked it in Google colab with pip .
I tried to run the toy example in tsinfer.readthedocs.io, but it has same error.

@frezaei98
Copy link
Author

I tried to run the toy example in tsinfer.readthedocs.io, but it has same error.

Hi,
I tried to run the toy example in tsinfer.readthedocs.io, but it has same error.

@jeromekelleher
Copy link
Member

I can confirm this happens in a fresh venv... investigating.

@jeromekelleher
Copy link
Member

Very strange:

(venv) jk@tempest$ python3 tsinfer-example.py 
Traceback (most recent call last):
  File "tsinfer-example.py", line 12, in <module>
    inferred_ts = tsinfer.infer(sample_data)
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 291, in infer
    ancestors_ts = match_ancestors(
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 449, in match_ancestors
    return matcher.match_ancestors()
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1401, in match_ancestors
    ts = self.store_output()
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1477, in store_output
    ts = self.get_ancestors_tree_sequence()
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1448, in get_ancestors_tree_sequence
    self.convert_inference_mutations(tables)
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1232, in convert_inference_mutations
    metadata = _update_site_metadata(site.metadata, constants.INFERENCE_FULL)
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 150, in _update_site_metadata
    return {"inference_type": inference_type, **current_metadata}
TypeError: 'int' object is not a mapping

Somehow we're setting the metadata to 0 for sites when I run the version from a pip installation, but not when working with the git version. Very strange...

@frezaei98
Copy link
Author

I can confirm this happens in a fresh venv... investigating.

I have been working on tsinfer for three or four months and it was very strange that it was right until Monday, but on Tuesday I want to continue my work, I got an error....

@frezaei98
Copy link
Author

Very strange:

(venv) jk@tempest$ python3 tsinfer-example.py 
Traceback (most recent call last):
  File "tsinfer-example.py", line 12, in <module>
    inferred_ts = tsinfer.infer(sample_data)
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 291, in infer
    ancestors_ts = match_ancestors(
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 449, in match_ancestors
    return matcher.match_ancestors()
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1401, in match_ancestors
    ts = self.store_output()
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1477, in store_output
    ts = self.get_ancestors_tree_sequence()
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1448, in get_ancestors_tree_sequence
    self.convert_inference_mutations(tables)
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 1232, in convert_inference_mutations
    metadata = _update_site_metadata(site.metadata, constants.INFERENCE_FULL)
  File "/home/jk/tmp/venv/lib/python3.8/site-packages/tsinfer/inference.py", line 150, in _update_site_metadata
    return {"inference_type": inference_type, **current_metadata}
TypeError: 'int' object is not a mapping

Somehow we're setting the metadata to 0 for sites when I run the version from a pip installation, but not when working with the git version. Very strange...

@jeromekelleher
Copy link
Member

I'm going to reopen this @frezaei98 - there's definitely something wrong here. I'm not sure if it's directly a bug in tsinfer or some weird interaction with an upstream package, but it's something we need to get to the bottom of.

@jeromekelleher
Copy link
Member

I've tracked this down to a problem with Zarr @frezaei98. Something changed in the behaviour of metadata with version 2.11.

MWE:

import tsinfer

with tsinfer.SampleData(sequence_length=6) as sample_data:
    sample_data.add_site(0, [0, 1, 0, 0, 0], ["A", "T"])

for site in sample_data.sites():
    print(site)

with version 2.10 we get:

Site(id=0, position=0.0, ancestral_state='A', metadata={}, time=nan, alleles=('A', 'T'))

and version 2.11 we have

Site(id=0, position=0.0, ancestral_state='A', metadata=0, time=nan, alleles=('A', 'T'))

Note the metadata=0 rather than metadata={}.

The short term fix is to install zarr version 2.10, e.g. python3 -m pip install zarr==2.10.

I'm not sure if this is a bug in zarr or whether we were depending on some undocumented/dodgy behaviour. I guess we need to figure this out before we decide what the right approach is.

@hyanwong
Copy link
Member

Thanks for tracking this down @jeromekelleher. Good bug hunting there.

@benjeffery
Copy link
Member

benjeffery commented Feb 14, 2022

Bisected to zarr-developers/zarr-python@f461eb7 still figuring out which bit of it! Seems related to handling of empty chunks.

@hyanwong
Copy link
Member

Good luck @benjeffery !

@jeromekelleher
Copy link
Member

Aha, fantastic, thanks @benjeffery! I bet it's the write_empty_chunks=False default that's hit us, and we just have to set that to True explicitly. I'll open a PR to test.

@benjeffery
Copy link
Member

Aha, fantastic, thanks @benjeffery! I bet it's the write_empty_chunks=False default that's hit us, and we just have to set that to True explicitly. I'll open a PR to test.

Yep, I expect so. Was just trying to find the spot in tsinfer.

@benjeffery
Copy link
Member

@jeromekelleher adding write_empty_chunks = True at formats.py:1059 hasn't worked for me. The only other place that default value is used is on read-back internally in zarr (hierarchy.py:355), where we have no control. Adding the param there does fix.

I think this is a PR to zarr, will write up.

@jeromekelleher
Copy link
Member

Yeah, I tried to do a quick fix over in #631, which didn't work. Thanks for following up!

@benjeffery
Copy link
Member

Reported upstream zarr-developers/zarr-python#965

@jeromekelleher jeromekelleher changed the title tsinfer.infer() dose not work. Problems with zarr 2.11.0 Feb 15, 2022
@frezaei98
Copy link
Author

.

I've tracked this down to a problem with Zarr @frezaei98. Something changed in the behaviour of metadata with version 2.11.

MWE:

import tsinfer

with tsinfer.SampleData(sequence_length=6) as sample_data:
    sample_data.add_site(0, [0, 1, 0, 0, 0], ["A", "T"])

for site in sample_data.sites():
    print(site)

with version 2.10 we get:

Site(id=0, position=0.0, ancestral_state='A', metadata={}, time=nan, alleles=('A', 'T'))

and version 2.11 we have

Site(id=0, position=0.0, ancestral_state='A', metadata=0, time=nan, alleles=('A', 'T'))

Note the metadata=0 rather than metadata={}.

The short term fix is to install zarr version 2.10, e.g. python3 -m pip install zarr==2.10.

I'm not sure if this is a bug in zarr or whether we were depending on some undocumented/dodgy behaviour. I guess we need to figure this out before we decide what the right approach is.

Many thanks. I changed the version of Zarr package and it fixed and worked.

@benjeffery
Copy link
Member

Here's an additional zarr issue that we hit in this release:
zarr-developers/zarr-python#967

percyfal added a commit to percyfal/tswf that referenced this issue Mar 11, 2022
percyfal added a commit to percyfal/tswf that referenced this issue Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants