Skip to content

RF+ENH: nib-diff - allow to specify absolute and/or relative maximal diff to tolerate #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Sep 14, 2018

So now it should be possible to get an idea on how much data in the given files differs:

$> nib-diff --ma 0.000001 --mr .001 ./tests-run/output/./sub-1_T1w_5mm_noise_corrected.nii.gz /tmp/sub-1_T1w_5mm_noise_corrected.nii.gz
These files are different.
Field          1:sub-1_T1w_5mm_noise_corrected.nii.gz                   2:sub-1_T1w_5mm_noise_corrected.nii.gz
DATA(md5)      65df09c06b236342eaf7e2fe57aabf55                       3c6e9069e6e054e714f2894419848df0
DATA(diff 1:)  -                                                      abs: 7.6293945e-06, rel: 0.002224694

and could be applied to >2 files as well

TODOs

  • seek feedback regarding naming of parameters (e.g. data_max_abs_diff) - they are kinda mouthful but wanted to be specific happen we add some later on dedicated to header etc
  • seek feedback regarding output formatting
  • seek feedback regarding "logic" - ATM specification of --ma would also affect estimation of relative difference. I did it on purpose so we could easily avoid divisions in tiny numbers, which could reach high values. But may be that is undesired?
  • add tests

attn @chrispycheng

…differences to tolerate

So now it should be possible to get an idea on how much data in the given files differs:

    $> nib-diff --ma 0.000001 --mr .001 ./tests-run/output/./sub-1_T1w_5mm_noise_corrected.nii.gz /tmp/sub-1_T1w_5mm_noise_corrected.nii.gz
    These files are different.
    Field          1:sub-1_T1w_5mm_noise_corrected.nii.gz                   2:sub-1_T1w_5mm_noise_corrected.nii.gz
    DATA(md5)      65df09c06b236342eaf7e2fe57aabf55                       3c6e9069e6e054e714f2894419848df0
    DATA(diff 1:)  -                                                      abs: 7.6293945e-06, rel: 0.002224694
@chrispycheng
Copy link
Contributor

Some style stuff to get out of the way:

nibabel/cmdline/diff.py:164:29: E226 missing whitespace around arithmetic operator
nibabel/cmdline/diff.py:166:38: E226 missing whitespace around arithmetic operator
nibabel/cmdline/diff.py:189:41: E261 at least two spaces before inline comment
nibabel/cmdline/diff.py:197:40: E226 missing whitespace around arithmetic operator

@chrispycheng
Copy link
Contributor

chrispycheng commented Sep 20, 2018

For the diff function itself:

  • FIX test_utils (and I'm not really sure how to do this one) on line np.testing.assert_equal(main(test_names, StringIO()), expected_difference) which spits out:
    ValueError: operands could not be broadcast together with shapes (4,5,7) (128,96,24,2)
  • FIX test_display_diff on line assert_equal(display_diff(bogus_names, dict_values), expected_output) which is different obviously because the output was changed. A simple correction to Field/File will do
  • FIX test_scripts which anticipates "Field" as opposed to the new header in the table output, line 75. A simple correction to Field/File will do

@chrispycheng
Copy link
Contributor

chrispycheng commented Sep 20, 2018

In response to your TODO concerns above:

  1. Parameter name lengths aren't a problem - can't be more abbreviated than -ma, anyways. I reckon it'll be nice to be able to see what -ma stands for in the code, but I don't expect anyone to use those longer option names.
  2. Output looks fine overall but maybe we want a bit of spacing in the table headers with file names between the colon and subsequent value. This way we can minimize confusion between the numbering of each file and each file name itself.

@coveralls
Copy link

coveralls commented Sep 25, 2018

Coverage Status

Coverage increased (+0.01%) to 91.838% when pulling 716b1c6 on yarikoptic:enh-diff into be35aca on nipy:master.

@codecov-io
Copy link

codecov-io commented Sep 25, 2018

Codecov Report

Merging #661 into master will increase coverage by 0.05%.
The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #661      +/-   ##
==========================================
+ Coverage   88.86%   88.91%   +0.05%     
==========================================
  Files          93       93              
  Lines       11378    11478     +100     
  Branches     1869     1899      +30     
==========================================
+ Hits        10111    10206      +95     
- Misses        930      933       +3     
- Partials      337      339       +2
Impacted Files Coverage Δ
nibabel/cmdline/diff.py 94.57% <89.47%> (-2.02%) ⬇️
nibabel/freesurfer/io.py 95.1% <0%> (+0.83%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7877add...716b1c6. Read the comment docs.

@chrispycheng
Copy link
Contributor

Coverage is handled and Travis passed and the only reason AppVeyor isn't cruising is because it's glitched out with some NiBabel stuff unrelated to this pull request.

Copy link
Member Author

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks Chris!
Left minor comments on fixups


Returns
-------
TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be so kind to ad address this one as well?

@@ -11,7 +11,7 @@
import nibabel as nib
import numpy as np
from nibabel.cmdline.utils import *
from nibabel.cmdline.diff import get_headers_diff, display_diff, main, get_data_diff
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of us has managed to make this file executable, please undo:
capture _2018-09-27-20-06-42
If you like a challenge - undo by rewriting that original commit. Workflow:

  • fix, commit
  • git rebase -i BADCOMMIT^ where you reposition fixing commit after the one to fix, and give it s status to squash them into one
  • git push -f since now you rewritten a commit

@@ -72,11 +72,14 @@ def check_nib_diff_examples():
fnames = [pjoin(DATA_PATH, f)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here about permissions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jk I got it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good ;)

for item in checked_fields:
if item not in stdout:
print(item)
print(stdout)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Gods dislike such printouts (although I am not sure if that want left by me here :-)). This print will get lost since when you run all tests at once, errors details reported at the end whenever print happened long before. Add a msg to your assert below providing what you want us to see when it fails

@yarikoptic
Copy link
Member Author

No feedback will be considered to be positive feedback, so we would leave making and logic as is. Used this prototype already a few times, came in handy

@yarikoptic
Copy link
Member Author

yarikoptic commented Sep 28, 2018

Wow, appveyor really went nuts there. Hopefully some other pr will fix it up

My favorite there

FAIL: nibabel.tests.test_minc2.TestMinc2File.test_mincfile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python35-x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "c:\python35-x64\lib\site-packages\nibabel\tests\test_minc1.py", line 159, in test_mincfile
    assert_equal(mnc.get_data_dtype().type, tp['dtype'])
AssertionError: <class 'numpy.float64'> != <class 'numpy.float64'>

@chrispycheng
Copy link
Contributor

AppVeyor's fail here is fake news, it's tripping up again

@effigies effigies mentioned this pull request Oct 1, 2018
19 tasks
Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Some suggestions for clarity.

@@ -101,8 +116,8 @@ def get_headers_diff(file_headers, names=None):
return difference


def get_data_diff(files):
"""Get difference between md5 values
def get_data_md5_diff(files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just future-proofing: How about naming it get_data_hash_diff? (Understood that the hash will be MD5 for the foreseeable future.)

Parameters
----------
files: list of (str or ndarray)
If list of strings is provided -- they must be existing file names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndarray I assume means a data block equivalent to one loaded with nib.load().get_fdata() or similar?

str: absolute and relative differences of each file, given as float
"""
# we are doomed to keep them in RAM now
data = [f if isinstance(f, np.ndarray) else nib.load(f).get_data() for f in files]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that get_data() is on its way out, I would suggest using either dataobj.get_unscaled() or get_fdata(np.float32) here, depending on whether you're interested in on-disk values or (more likely) scaled values.

Returns
-------
OrderedDict
str: absolute and relative differences of each file, given as float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the shape of the output, given this. Are the values 2-tuples, or a list of N-1 2-tuples? And what is the absolute diff for a file? Presumably the (max/mean/median) of the voxelwise absolute diffs. It would help to be explicit here.

type=float,
default=0.0,
help="Maximal relative difference in data between files to tolerate."
" If also --data-max-abs-diff specified, only the data points "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --data-max-abs-diff is also specified

"""

# we are doomed to keep them in RAM now
data = [f if isinstance(f, np.ndarray) else nib.load(f).get_fdata() for f in files]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that get_fdata() returns float64 arrays by default. If you're hoping to keep things moderately compact, you could use get_fdata(dtype=np.float32). The precision loss should not affect equivalent files, and should be in the noise for any plausible MRI data.

"""

# we are doomed to keep them in RAM now
data = [f if isinstance(f, np.ndarray) else nib.load(f).get_fdata(dtype=np.float32) for f in files]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long for the style checks.

@effigies
Copy link
Member

effigies commented Oct 3, 2018

@chrispycheng @yarikoptic I just went ahead and fixed the style issue. I'm happy to merge if you're all set on this one.

@yarikoptic
Copy link
Member Author

in local conversion with @chrispycheng I have questioned the change in 034c276 "hardcoding" the data type to float32. I think no type conversion should be done so if files are of different data type, and thus possibly of different values (e.g. 1/3-rd would be different in float32 and float64), we could see that. so may be that change should be reverted?

@effigies
Copy link
Member

effigies commented Oct 3, 2018

get_fdata() is always np.float64 unless otherwise specified.

Do you want to compare on-disk values (dataobj.get_unscaled()) or effective values (get_fdata())? get_data does give you sort of that sweet spot of "maybe it's one, maybe it's the other", but it's also supposed to be deprecated as of Apr 2018...

@yarikoptic
Copy link
Member Author

get_fdata() is always np.float64 unless otherwise specified.

get_unscaled probably would be "too low level".
We are typically interested in the effective values. With all the scaling etc we indeed can't rely on original data types and I would have stick to the default (float64) as providing the best "fidelity" as decided by nibabel in general to be the default data type to be returned. May be eventually get_fdata would get somehow smarter and not downcast if data type is e.g. float128. So the less of casting we do here, the better imho

@effigies
Copy link
Member

effigies commented Oct 3, 2018

I would have stick to the default (float64) as providing the best "fidelity" as decided by nibabel in general to be the default data type to be returned

Okay. Just making sure that's what you wanted. If diffing BOLD series, that could get expensive quickly.

@yarikoptic
Copy link
Member Author

I would have stick to the default (float64) as providing the best "fidelity" as decided by nibabel in general to be the default data type to be returned

Okay. Just making sure that's what you wanted. If diffing BOLD series, that could get expensive quickly.

indeed... but if we decide to provide help for those, we should just add another parameter (dtype) so user could specify explicitly. Could be done in a separate PR.

@yarikoptic
Copy link
Member Author

Hi @effigies, need your guidance here with appveyor.
python3.4 environment - the beast fails to install hypothesis from "source tarball" failing with

  Downloading https://files.pythonhosted.org/packages/b0/58/5def5924eb6068f50855dfb74f9fe230a6437b7d104cd593c8d04daef400/hypothesis-3.74.1.tar.gz (177kB)
    Complete output from command python setup.py egg_info:
    C:\Users\appveyor\AppData\Local\Temp\1\pip-install-a58k4iic\hypothesis\setup.py:39: UserWarning: This version of setuptools is too old to correctly store conditional dependencies in binary wheels.  For more info, see:  https://hynek.me/articles/conditional-python-dependencies/
      'This version of setuptools is too old to correctly store '
    c:\python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    error in hypothesis setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected version spec in enum34; python_version=="2.7" at ; python_version=="2.7"

It is a known issue to the hypothesis people (HypothesisWorks/hypothesis#1091) which they decided just to ignore altogether. The funny part is that there is a whl for hypothesis for python3, and some times it gets installed (just fine) instead of trying to get it installed from the source tarball. That is when appveyor doesn't fail.
So I wonder what do you think we should we do about it? I do not think we should merge into master as is to not cause those sporadic failures on appveyor

@effigies
Copy link
Member

effigies commented Oct 3, 2018

My best guess from reading this is we need to update setuptools in appveyor. I'll look around a little.

@effigies
Copy link
Member

effigies commented Oct 3, 2018

@effigies
Copy link
Member

effigies commented Oct 3, 2018

Feel free to cherry-pick 716b1c6 (on upstream/appveyor/upgrade_setuptools). At least it doesn't seem to cause problems, and the setuptools is woefully outdated. That said, it hasn't tried to build hypothesis from source. It found the wheels in both the 32-bit and 64-bit jobs.

@effigies
Copy link
Member

effigies commented Oct 4, 2018

Pushed the Appveyor update directly to this branch, as it at least causes no harm. Please let me know what the status is on this PR.

@chrispycheng
Copy link
Contributor

@effigies seems like still no dice with appveyor

@effigies
Copy link
Member

effigies commented Oct 4, 2018

Those AppVeyor bugs are hitting every PR and master. Nothing to do with hypothesis that I can see.

@yarikoptic
Copy link
Member Author

"those" is a spectrum here ;-) Some were due to hypothesis, some not.

@yarikoptic
Copy link
Member Author

AVSD - AppVeyor Spectrum Disorder

@yarikoptic
Copy link
Member Author

eh, this one is a base now for 2 other PRs (#672 and #678), besides that commit to update setuptools which didn't help. @effigies - what do you prefer?

@effigies
Copy link
Member

Let's close this in favor of #678. I'll update the name over there.

@effigies effigies closed this Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants