RF+ENH: nib-diff - allow to specify absolute and/or relative maximal diff to tolerate #661

yarikoptic · 2018-09-14T20:46:20Z

So now it should be possible to get an idea on how much data in the given files differs:

$> nib-diff --ma 0.000001 --mr .001 ./tests-run/output/./sub-1_T1w_5mm_noise_corrected.nii.gz /tmp/sub-1_T1w_5mm_noise_corrected.nii.gz
These files are different.
Field          1:sub-1_T1w_5mm_noise_corrected.nii.gz                   2:sub-1_T1w_5mm_noise_corrected.nii.gz
DATA(md5)      65df09c06b236342eaf7e2fe57aabf55                       3c6e9069e6e054e714f2894419848df0
DATA(diff 1:)  -                                                      abs: 7.6293945e-06, rel: 0.002224694

and could be applied to >2 files as well

TODOs

seek feedback regarding naming of parameters (e.g. data_max_abs_diff) - they are kinda mouthful but wanted to be specific happen we add some later on dedicated to header etc
seek feedback regarding output formatting
seek feedback regarding "logic" - ATM specification of --ma would also affect estimation of relative difference. I did it on purpose so we could easily avoid divisions in tiny numbers, which could reach high values. But may be that is undesired?
add tests

attn @chrispycheng

…differences to tolerate So now it should be possible to get an idea on how much data in the given files differs: $> nib-diff --ma 0.000001 --mr .001 ./tests-run/output/./sub-1_T1w_5mm_noise_corrected.nii.gz /tmp/sub-1_T1w_5mm_noise_corrected.nii.gz These files are different. Field 1:sub-1_T1w_5mm_noise_corrected.nii.gz 2:sub-1_T1w_5mm_noise_corrected.nii.gz DATA(md5) 65df09c06b236342eaf7e2fe57aabf55 3c6e9069e6e054e714f2894419848df0 DATA(diff 1:) - abs: 7.6293945e-06, rel: 0.002224694

chrispycheng · 2018-09-20T16:00:14Z

Some style stuff to get out of the way:

nibabel/cmdline/diff.py:164:29: E226 missing whitespace around arithmetic operator
nibabel/cmdline/diff.py:166:38: E226 missing whitespace around arithmetic operator
nibabel/cmdline/diff.py:189:41: E261 at least two spaces before inline comment
nibabel/cmdline/diff.py:197:40: E226 missing whitespace around arithmetic operator

chrispycheng · 2018-09-20T16:21:06Z

For the diff function itself:

FIX test_utils (and I'm not really sure how to do this one) on line np.testing.assert_equal(main(test_names, StringIO()), expected_difference) which spits out:
ValueError: operands could not be broadcast together with shapes (4,5,7) (128,96,24,2)
FIX test_display_diff on line assert_equal(display_diff(bogus_names, dict_values), expected_output) which is different obviously because the output was changed. A simple correction to Field/File will do
FIX test_scripts which anticipates "Field" as opposed to the new header in the table output, line 75. A simple correction to Field/File will do

chrispycheng · 2018-09-20T16:34:27Z

In response to your TODO concerns above:

Parameter name lengths aren't a problem - can't be more abbreviated than -ma, anyways. I reckon it'll be nice to be able to see what -ma stands for in the code, but I don't expect anyone to use those longer option names.
Output looks fine overall but maybe we want a bit of spacing in the table headers with file names between the colon and subsequent value. This way we can minimize confusion between the numbering of each file and each file name itself.

…cted tests

coveralls · 2018-09-25T15:33:34Z

Coverage increased (+0.01%) to 91.838% when pulling 716b1c6 on yarikoptic:enh-diff into be35aca on nipy:master.

codecov-io · 2018-09-25T15:41:37Z

Codecov Report

Merging #661 into master will increase coverage by 0.05%.
The diff coverage is 89.47%.

@@            Coverage Diff             @@
##           master     #661      +/-   ##
==========================================
+ Coverage   88.86%   88.91%   +0.05%     
==========================================
  Files          93       93              
  Lines       11378    11478     +100     
  Branches     1869     1899      +30     
==========================================
+ Hits        10111    10206      +95     
- Misses        930      933       +3     
- Partials      337      339       +2

Impacted Files	Coverage Δ
nibabel/cmdline/diff.py	`94.57% <89.47%> (-2.02%)`	⬇️
nibabel/freesurfer/io.py	`95.1% <0%> (+0.83%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7877add...716b1c6. Read the comment docs.

…put, added tests for coverage

chrispycheng · 2018-09-27T23:46:39Z

Coverage is handled and Travis passed and the only reason AppVeyor isn't cruising is because it's glitched out with some NiBabel stuff unrelated to this pull request.

yarikoptic

Awesome. Thanks Chris!
Left minor comments on fixups

yarikoptic · 2018-09-28T00:01:16Z

nibabel/cmdline/diff.py

+
+    Returns
+    -------
+    TODO


Would you be so kind to ad address this one as well?

yarikoptic · 2018-09-28T00:11:33Z

nibabel/cmdline/tests/test_utils.py

@@ -11,7 +11,7 @@
 import nibabel as nib
 import numpy as np
 from nibabel.cmdline.utils import *
-from nibabel.cmdline.diff import get_headers_diff, display_diff, main, get_data_diff


One of us has managed to make this file executable, please undo:

If you like a challenge - undo by rewriting that original commit. Workflow:

fix, commit

git rebase -i BADCOMMIT^ where you reposition fixing commit after the one to fix, and give it s status to squash them into one

git push -f since now you rewritten a commit

yarikoptic · 2018-09-28T00:13:11Z

nibabel/tests/test_scripts.py

@@ -72,11 +72,14 @@ def check_nib_diff_examples():
    fnames = [pjoin(DATA_PATH, f)


The same here about permissions

Please clarify?

Jk I got it

yarikoptic · 2018-09-28T00:16:45Z

nibabel/tests/test_scripts.py

    for item in checked_fields:
+        if item not in stdout:
+            print(item)
+            print(stdout)


Some Gods dislike such printouts (although I am not sure if that want left by me here :-)). This print will get lost since when you run all tests at once, errors details reported at the end whenever print happened long before. Add a msg to your assert below providing what you want us to see when it fails

yarikoptic · 2018-09-28T00:19:53Z

No feedback will be considered to be positive feedback, so we would leave making and logic as is. Used this prototype already a few times, came in handy

yarikoptic · 2018-09-28T00:26:19Z

Wow, appveyor really went nuts there. Hopefully some other pr will fix it up

My favorite there

FAIL: nibabel.tests.test_minc2.TestMinc2File.test_mincfile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python35-x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "c:\python35-x64\lib\site-packages\nibabel\tests\test_minc1.py", line 159, in test_mincfile
    assert_equal(mnc.get_data_dtype().type, tp['dtype'])
AssertionError: <class 'numpy.float64'> != <class 'numpy.float64'>

…ing script in test_scripts

chrispycheng · 2018-09-29T15:48:08Z

AppVeyor's fail here is fake news, it's tripping up again

effigies

Looks reasonable. Some suggestions for clarity.

effigies · 2018-10-01T16:16:01Z

nibabel/cmdline/diff.py

@@ -101,8 +116,8 @@ def get_headers_diff(file_headers, names=None):
    return difference


-def get_data_diff(files):
-    """Get difference between md5 values
+def get_data_md5_diff(files):


Just future-proofing: How about naming it get_data_hash_diff? (Understood that the hash will be MD5 for the foreseeable future.)

effigies · 2018-10-01T16:20:50Z

nibabel/cmdline/diff.py

+    Parameters
+    ----------
+    files: list of (str or ndarray)
+      If list of strings is provided -- they must be existing file names


ndarray I assume means a data block equivalent to one loaded with nib.load().get_fdata() or similar?

effigies · 2018-10-01T16:21:51Z

nibabel/cmdline/diff.py

+        str: absolute and relative differences of each file, given as float
+    """
+    # we are doomed to keep them in RAM now
+    data = [f if isinstance(f, np.ndarray) else nib.load(f).get_data() for f in files]


Given that get_data() is on its way out, I would suggest using either dataobj.get_unscaled() or get_fdata(np.float32) here, depending on whether you're interested in on-disk values or (more likely) scaled values.

effigies · 2018-10-01T16:22:52Z

nibabel/cmdline/diff.py

+    Returns
+    -------
+    OrderedDict
+        str: absolute and relative differences of each file, given as float


I don't really understand the shape of the output, given this. Are the values 2-tuples, or a list of N-1 2-tuples? And what is the absolute diff for a file? Presumably the (max/mean/median) of the voxelwise absolute diffs. It would help to be explicit here.

effigies · 2018-10-01T16:28:37Z

nibabel/cmdline/diff.py

+               type=float,
+               default=0.0,
+               help="Maximal relative difference in data between files to tolerate."
+                    " If also --data-max-abs-diff specified, only the data points "


If --data-max-abs-diff is also specified

…ta()

effigies · 2018-10-02T14:56:31Z

nibabel/cmdline/diff.py

+    """
+
+    # we are doomed to keep them in RAM now
+    data = [f if isinstance(f, np.ndarray) else nib.load(f).get_fdata() for f in files]


Note that get_fdata() returns float64 arrays by default. If you're hoping to keep things moderately compact, you could use get_fdata(dtype=np.float32). The precision loss should not affect equivalent files, and should be in the noise for any plausible MRI data.

effigies · 2018-10-02T16:23:36Z

nibabel/cmdline/diff.py

+    """
+
+    # we are doomed to keep them in RAM now
+    data = [f if isinstance(f, np.ndarray) else nib.load(f).get_fdata(dtype=np.float32) for f in files]


This line is too long for the style checks.

effigies · 2018-10-03T13:31:19Z

@chrispycheng @yarikoptic I just went ahead and fixed the style issue. I'm happy to merge if you're all set on this one.

yarikoptic · 2018-10-03T13:49:16Z

in local conversion with @chrispycheng I have questioned the change in 034c276 "hardcoding" the data type to float32. I think no type conversion should be done so if files are of different data type, and thus possibly of different values (e.g. 1/3-rd would be different in float32 and float64), we could see that. so may be that change should be reverted?

effigies · 2018-10-03T13:56:20Z

get_fdata() is always np.float64 unless otherwise specified.

Do you want to compare on-disk values (dataobj.get_unscaled()) or effective values (get_fdata())? get_data does give you sort of that sweet spot of "maybe it's one, maybe it's the other", but it's also supposed to be deprecated as of Apr 2018...

yarikoptic · 2018-10-03T14:29:00Z

get_fdata() is always np.float64 unless otherwise specified.

get_unscaled probably would be "too low level".
We are typically interested in the effective values. With all the scaling etc we indeed can't rely on original data types and I would have stick to the default (float64) as providing the best "fidelity" as decided by nibabel in general to be the default data type to be returned. May be eventually get_fdata would get somehow smarter and not downcast if data type is e.g. float128. So the less of casting we do here, the better imho

effigies · 2018-10-03T14:31:19Z

I would have stick to the default (float64) as providing the best "fidelity" as decided by nibabel in general to be the default data type to be returned

Okay. Just making sure that's what you wanted. If diffing BOLD series, that could get expensive quickly.

yarikoptic · 2018-10-03T14:53:28Z

I would have stick to the default (float64) as providing the best "fidelity" as decided by nibabel in general to be the default data type to be returned

Okay. Just making sure that's what you wanted. If diffing BOLD series, that could get expensive quickly.

indeed... but if we decide to provide help for those, we should just add another parameter (dtype) so user could specify explicitly. Could be done in a separate PR.

…mparison

… enh-diff

yarikoptic · 2018-10-03T17:07:43Z

Hi @effigies, need your guidance here with appveyor.
python3.4 environment - the beast fails to install hypothesis from "source tarball" failing with

  Downloading https://files.pythonhosted.org/packages/b0/58/5def5924eb6068f50855dfb74f9fe230a6437b7d104cd593c8d04daef400/hypothesis-3.74.1.tar.gz (177kB)
    Complete output from command python setup.py egg_info:
    C:\Users\appveyor\AppData\Local\Temp\1\pip-install-a58k4iic\hypothesis\setup.py:39: UserWarning: This version of setuptools is too old to correctly store conditional dependencies in binary wheels.  For more info, see:  https://hynek.me/articles/conditional-python-dependencies/
      'This version of setuptools is too old to correctly store '
    c:\python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    error in hypothesis setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected version spec in enum34; python_version=="2.7" at ; python_version=="2.7"

It is a known issue to the hypothesis people (HypothesisWorks/hypothesis#1091) which they decided just to ignore altogether. The funny part is that there is a whl for hypothesis for python3, and some times it gets installed (just fine) instead of trying to get it installed from the source tarball. That is when appveyor doesn't fail.
So I wonder what do you think we should we do about it? I do not think we should merge into master as is to not cause those sporadic failures on appveyor

effigies · 2018-10-03T17:14:57Z

My best guess from reading this is we need to update setuptools in appveyor. I'll look around a little.

effigies · 2018-10-03T17:20:27Z

Watch this build: https://ci.appveyor.com/project/nipy/nibabel/build/1.0.501

effigies · 2018-10-03T18:16:03Z

Feel free to cherry-pick 716b1c6 (on upstream/appveyor/upgrade_setuptools). At least it doesn't seem to cause problems, and the setuptools is woefully outdated. That said, it hasn't tried to build hypothesis from source. It found the wheels in both the 32-bit and 64-bit jobs.

effigies · 2018-10-04T19:57:39Z

Pushed the Appveyor update directly to this branch, as it at least causes no harm. Please let me know what the status is on this PR.

chrispycheng · 2018-10-04T21:13:17Z

@effigies seems like still no dice with appveyor

effigies · 2018-10-04T23:28:09Z

Those AppVeyor bugs are hitting every PR and master. Nothing to do with hypothesis that I can see.

yarikoptic · 2018-10-05T02:45:18Z

"those" is a spectrum here ;-) Some were due to hypothesis, some not.

yarikoptic · 2018-10-05T02:45:52Z

AVSD - AppVeyor Spectrum Disorder

yarikoptic · 2018-10-11T15:54:29Z

eh, this one is a base now for 2 other PRs (#672 and #678), besides that commit to update setuptools which didn't help. @effigies - what do you prefer?

close this here and have RF+BF: Add tolerances and data types for nib-diff, remove hypothesis dependency #678 replace it
or merge this one and then the others (RF+BF: Add tolerances and data types for nib-diff, remove hypothesis dependency #678 I think is ready)

effigies · 2018-10-11T15:56:28Z

Let's close this in favor of #678. I'll update the name over there.

yarikoptic added 2 commits September 14, 2018 16:38

ENH: nib-diff Field/File not just Field in the header

018eceb

yarikoptic added the enhancement label Sep 14, 2018

chrispycheng added 3 commits September 21, 2018 13:37

changed as commented out in the pull request

833b4df

changes as commented

9b123f6

RF: anticipated files of different shapes, fixed table display, corre…

1e33ea7

…cted tests

elaborated docstring, modified get_data_diff to allow direct array in…

76ca32f

…put, added tests for coverage

yarikoptic commented Sep 28, 2018

View reviewed changes

chrispycheng added 2 commits September 28, 2018 12:59

added to diff documentation, undid executable change, took out debugg…

0aa6370

…ing script in test_scripts

undid permission snafu on test_scripts

d057249

effigies mentioned this pull request Oct 1, 2018

REL: 2.3.1 #667

Merged

19 tasks

effigies reviewed Oct 1, 2018

View reviewed changes

docstring and function name clarification, change get_data to get_fda…

76ee358

…ta()

effigies reviewed Oct 2, 2018

View reviewed changes

corrected styles per Travis, limited fdata to float32

034c276

effigies reviewed Oct 2, 2018

View reviewed changes

STY: Break overly-long line

19fcdd5

chrispycheng added 2 commits October 3, 2018 13:03

prepared for future PR to allow modification of dtype used in diff co…

93c7bb6

…mparison

Merge branch 'enh-diff' of https://github.com/yarikoptic/nibabel into…

cd85e09

… enh-diff

CI: Update pip/setuptools in AppVeyor

716b1c6

chrispycheng mentioned this pull request Oct 3, 2018

TST: Validate nib-diff dtype command-line option #672

Merged

2 tasks

yarikoptic mentioned this pull request Oct 4, 2018

BF(workaround): skip test_diff if hypothesis is not importable #675

Closed

yarikoptic force-pushed the enh-diff branch from 212bb15 to 716b1c6 Compare October 6, 2018 02:53

effigies closed this Oct 11, 2018

		@@ -72,11 +72,14 @@ def check_nib_diff_examples():
		fnames = [pjoin(DATA_PATH, f)

RF+ENH: nib-diff - allow to specify absolute and/or relative maximal diff to tolerate #661

RF+ENH: nib-diff - allow to specify absolute and/or relative maximal diff to tolerate #661

Uh oh!

Conversation

yarikoptic commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrispycheng commented Sep 20, 2018

Uh oh!

chrispycheng commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrispycheng commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chrispycheng commented Sep 27, 2018

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Sep 28, 2018

Uh oh!

yarikoptic commented Sep 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrispycheng commented Sep 29, 2018

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

effigies commented Oct 3, 2018

Uh oh!

yarikoptic commented Oct 3, 2018

Uh oh!

effigies commented Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yarikoptic commented Oct 3, 2018

Uh oh!

effigies commented Oct 3, 2018

Uh oh!

yarikoptic commented Oct 3, 2018

Uh oh!

yarikoptic commented Oct 3, 2018

Uh oh!

yarikoptic commented Sep 14, 2018 •

edited

Loading

chrispycheng commented Sep 20, 2018 •

edited

Loading

chrispycheng commented Sep 20, 2018 •

edited

Loading

coveralls commented Sep 25, 2018 •

edited

Loading

codecov-io commented Sep 25, 2018 •

edited

Loading

yarikoptic commented Sep 28, 2018 •

edited

Loading

effigies commented Oct 3, 2018 •

edited

Loading