Skip to content

ENH: Add static and dynamic dtype aliases to NIfTI images #1096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 3, 2022

Conversation

effigies
Copy link
Member

@effigies effigies commented Mar 19, 2022

This PR adds dtype aliases that can be passed in three ways:

  • Nifti1Image(data, affine[, header], dtype=alias)
  • img.set_data_dtype(alias)
  • img.to_filename(fname, dtype=alias) (and likewise img.to_bytes(fname, dtype=alias))

Aliases

'mask'

This is a static alias for uint8.

'smallest'

This requires an array to be integer data. It then selects the smallest dtype among the Analyze-compatible uint8, int16, int32 set.

'compat'

This is currently 'smallest' for integer data and float32 for floating point data (assuming no values are out of range). I'm not entirely pleased with this. Here's another option that I'd like opinions on before going ahead and implementing:

  • No change for uint8, int16, int32 or float32
  • int8 -> uint8 if min >= 0 else int16
  • uint16 -> int16 if max < 32767 else int32
  • All other ints become int32, raising an error on out-of-range values
  • All floats become float32, raising an error on out-of-range values

This would have has the advantage that arrays with Analyze-compatible types won't need their values inspected. The arguable disadvantage is that masks that are int64s will become int32 instead of uint8, but this mode is not intended to be smart.


Semantics

>>> img.set_data_dtype('mask')
>>> img.header.set_data_dtype('mask')
Exception ...

Because the value of the final dtype depends on the values in the data array, this is implemented via img.get/set_data_dtype(), and not img.header.get/set_data_dtype(). This may end up being surprising for people who have used them interchangeably, but should not affect existing code that does not use aliases. If this seems problematic, we could either set a flag to warn if the header methods are being used on a header associated with an image that uses aliases. Alternately, we could monkey-patch the header to use the image methods, restoring an expectation that these are equivalent.


Pinging @neurolabusc @jbteves @jeromedockes for feedback.

This PR builds on #1082. If you intend to review code, I would suggest doing it in a per-commit fashion.

@effigies effigies force-pushed the enh/dtype_aliases branch from b823f8d to 423ee28 Compare March 19, 2022 22:55
@codecov
Copy link

codecov bot commented Mar 19, 2022

Codecov Report

Merging #1096 (58d37a2) into master (d0532ec) will increase coverage by 0.01%.
The diff coverage is 96.49%.

@@            Coverage Diff             @@
##           master    #1096      +/-   ##
==========================================
+ Coverage   92.26%   92.28%   +0.01%     
==========================================
  Files         100      100              
  Lines       12269    12326      +57     
  Branches     2399     2416      +17     
==========================================
+ Hits        11320    11375      +55     
- Misses        624      625       +1     
- Partials      325      326       +1     
Impacted Files Coverage Δ
nibabel/nifti1.py 92.57% <96.49%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0532ec...58d37a2. Read the comment docs.

@effigies effigies force-pushed the enh/dtype_aliases branch from 423ee28 to 88480c6 Compare March 20, 2022 01:37
@jeromedockes
Copy link
Contributor

Thanks! I think that makes sense. About the names, it is not obvious that "smallest" implies integers, and "compat" stands for compatibility with other tools? why are float64 not allowed?

@effigies
Copy link
Member Author

"compat" is "Analyze-compatible", which ensures maximum compatibility with other tools (following this table). If someone wants their data to be float64, they don't need to use "compat" or any other dynamic dtype at all, so I'm not sure the overlap of "give me a compatible type" and "I have float64 data and don't mind keeping it that way" is that large. Were you thinking of a specific use-case?

Agreed that "smallest" isn't obviously int. I'm okay changing it to something else that would be clearer.

@jeromedockes
Copy link
Contributor

jeromedockes commented Mar 28, 2022 via email

@effigies
Copy link
Member Author

effigies commented Jun 3, 2022

@neurolabusc @jbteves Could I bug you for review of the API here, if not the code? I don't want to add unintuitive aliases and then go through a process of encouraging people to move to better ones later. If we can't settle on something satisfactory now, I'd rather push this off to another release.

@neurolabusc
Copy link

@effigies I think this is great. I think this will help interchange of NIfTI data between tools.

One feature that I am an advocate of, but where others likely have different opinions is uint16 -> int16 if max < 32767 else int32. Note that AFNI will promote uint16 to its native float32 while preserving int16 (since it is a native format). AFNI will fail if one analyzes two (or more) fMRI series where one series is saved as int16 and the other is uint16. Perhaps @mrneont, @afni-dglen, @afni-rickr have thoughts on this (and perhaps AFNI has been upgraded to avoid this issue since 2019.

I have in general tried to make the output of dcm2niix consistent across versions, but the default behavior of dcm2niix was updated to attempt to help AFNI users. Current versions of dcm2niix use -l o by default, while prior versions defaulted to -l n:

-l : losslessly scale 16-bit integers to use dynamic range (y/n/o [yes=scale, no=no, but uint16->int16, o=original], default o)

To reiterate, I support your proposed setting, I just think we should get a consensus from the AFNI team.

>>> img.get_data_dtype() == np.dtype('int64')
True
"""
# Numpy dtype comparison can fail in odd ways, check for aliases only if str
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a reference to somewhere where these odd failures are described since these comparisons may change over time. Certainly nothing worth blocking this PR over, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to be more explicit. Thanks for the feedback!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect, thank you!

@jbteves
Copy link

jbteves commented Jun 3, 2022

I think this looks good (and sorry for not replying to your initial request for comment on this PR). I have the same concern as Chris so I'd like to see AFNI team members weigh in as well.

@effigies effigies force-pushed the enh/dtype_aliases branch from 33dcfd0 to 0978eed Compare June 3, 2022 17:23
@pep8speaks
Copy link

pep8speaks commented Jun 3, 2022

Hello @effigies, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 nibabel.

Comment last updated at 2022-06-03 18:19:31 UTC

@effigies effigies force-pushed the enh/dtype_aliases branch from b59aaf9 to 3dc9604 Compare June 3, 2022 17:29
@effigies effigies force-pushed the enh/dtype_aliases branch 2 times, most recently from 9827468 to 8edc25e Compare June 3, 2022 17:53
@effigies effigies force-pushed the enh/dtype_aliases branch from 8edc25e to a8b6ff9 Compare June 3, 2022 17:54
@effigies
Copy link
Member Author

effigies commented Jun 3, 2022

Thanks for the reviews @neurolabusc and @jbteves!

One feature that I am an advocate of, but where others likely have different opinions is uint16 -> int16 if max < 32767 else int32. Note that AFNI will promote uint16 to its native float32 while preserving int16 (since it is a native format).

So I definitely don't want to expand uint16 into float32 as opposed to int32, as the integer nature of the data would be lost. I'm content to be inconsistent with AFNI here.

AFNI will fail if one analyzes two (or more) fMRI series where one series is saved as int16 and the other is uint16.

As you say that seems to be fixed in recent AFNI. I think the case where we might induce inconsistent types would be something like this:

img = nb.load("uint16.nii.gz")
for i in range(img.shape[3]):
    new_img = nb.Nifti1Image(img.dataobj[..., i], img.affine, img.header, dtype="compat")
    new_img.to_filename(f"vol{i:03d}.nii.gz")

Here we could get some volumes that become int16 and some that become int32, which might annoy a downstream tool that wants to re-concatenate. I think the "fix" for that would probably be an alternative compat mode where we go purely by safe casting, so int8 becomes int16, uint16 becomes int32. But I think we can push that off for now. This puts in the infrastructure and adding new modes will be a simple task if the situation calls for it.

Unless there are any objections, I will merge this by EOD.

@effigies effigies merged commit 1312493 into nipy:master Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants