Handling NaNs in ISC/ISFC (and ISFC output shape) by snastase · Pull Request #405 · brainiak/brainiak

snastase · 2019-01-12T22:59:08Z

I've added a tolerate_nans option to the isc and isfc functions to better handle NaNs, particularly when averaging time series for N–1 subjects in the leave-one-out approach. NaNs are typically introduced prior to ISC analyses by z-scoring voxels with zero variance (due to susceptibility, limited EPI FoV; see issue #398). The default value is True, which accommodates (i.e., ignores) any NaNs when computing the average of N–1 subject time series by using np.nanmean. If the user wants to handle NaNs beforehand, they can set tolerate_nans=False, which will return a NaN even if there's a single NaN in the time series for any single subject—but this uses np.mean rather than np.nanmean, which I believe is faster. Alternatively, the user can supply a float between 0.0 and 1.0 indicating the threshold proportion of subjects with non-NaN values needed to keep the voxel. For example, if the threshold is 0.8, any voxel with >= 80% subjects with non-NaN values will be kept and the average of N–1 subject time series will be computed to tolerate the remaining NaNs; if a voxel has < 80% subjects with non-NaN values (i.e., > 20% of subjects have NaNs), the voxel is excluded from the computation. In all cases, any voxel with a NaN in the time series across all subjects is removed prior to any computation, as this will always result in NaN in the final output (to reduce compute time). If voxels with NaNs are censored out, the resulting output will still match the original number of voxels with NaN values padded back in appropriately. I'm arguing to leave the default value as True rather than some particular proportion, because I'm not sure any specific proportion is actually reported consistently in the literature. I had to add an option to return NaNs in brainiak.fcma.util._normalize_for_correlation to make sure the isfc function (which uses fcma.util.compute_correlation for speed) returns NaNs consistent with isc (and other Python correlation functions; see issue #400). This PR relies on a merge of PR #404. Finally, I added a vectorize_isfcs option to isfc, which by default (True) vectorizes the upper triangle of the ISFC matrices, yielding a 2D n_subjects (or n_pairs) by n_connections (pairs of voxels) array; if vectorize_isfcs=False, we keep the square (redundant) ISFC matrix and output is shaped n_subjects (or n_pairs) by n_voxels by n_voxels. This is more consistent with the shape of the isc output, and is more compatible with the subsequent statistical tests.

snastase · 2019-01-15T03:24:02Z

I had to backpedal and rethink this. Simple scipy.spatial.distance.squareform discards the diagonal of the square ISFC matrix, which in our case is meaningful (the ISC values). I wrote a squareform_isfc function that mimics squareform functionality but retains the diagonal ISC values. So if you pass a n_subjects by n_voxels by n_voxels array of ISFCs to squareform_isfc, it returns a tuple comprising the condensed off-diagonal ISFC values and the diagonal ISC values. Alternatively, if you pass in the condensed off-diagonal ISFCs and the ISCs, it gives you back the original 3D array by squareforming the ISFCs and populating the diagonal with ISCs. If you set vectorize_isfcs=True (the default) in isfc, it calls squareform_isfc internally and returns this tuple of ISFCs and ISCs, both of which can be passed directly to the bootstrap_isc and permutation_isc statistical tests. I also took @qihongl's comment from PR #384 and now collapse first dimension when singleton. Hope it smooths things out a bit—open to suggestions on this!

snastase · 2019-01-15T18:23:38Z

@mihaic Any idea what's going on with this gcc mpich dependency error in the Travis Xcode 7.3 Mac build?

mihaic · 2019-01-15T18:25:05Z

@snastase, that looks like a glitch. You should be able to restart that particular job yourself on the Travis website. Can you please try?

snastase · 2019-01-21T21:02:31Z

@qihongl @manojneuro Can I get another pair of eyes on this? Thanks!

qihongl · 2019-01-22T19:57:42Z

@mihaic @snastase I'd love to take a look on this! Sorry about the delayed response. I'm a little over overwhelmed recently XD

qihongl · 2019-01-27T21:08:25Z

brainiak/isc.py

+
+    else:
+        logger.info("ISC computation will not tolerate NaNs when averaging")
+


I wonder if it is better to explicitly do if tolerate_nans is False: here, and return something like "invalid input" for the else: block (e.g. if tolerate_nans is a string)?

qihongl · 2019-01-27T21:22:12Z

btw, I was wondering if it is more intuitive to set vectorize_isfcs to False by default? The redundant matrix form ISFC values seem to be easier to understand / friendlier to new users?
Are there some good reasons to use the vectorized representation by default?

qihongl · 2019-01-27T21:24:00Z

I just went through the code once. Frankly, I don't have a lot of comments. I will go over everything again later w/ your tutorial later.
Thanks for the enhancement! These new features look awesome!

snastase · 2019-03-03T23:20:06Z

Newest commits should resolve issues #396 and #397.

mihaic · 2019-03-07T21:51:54Z

@snastase, we need to review and merge PR #414 for the Jenkins Linux test to pass.

snastase · 2019-03-18T18:46:50Z

Hey @mihaic—do you know what's breaking the Linux build? I thought I had successfully merged in the #414 fix at 57a9d09. Any ideas?

mihaic · 2019-03-18T18:56:26Z

@snastase, no sure. I'll try again.

mihaic · 2019-03-18T18:56:36Z

Jenkins, retest this please.

mihaic · 2019-03-18T21:00:35Z

tests/utils/test_utils.py

Pytest has a nice way to check exceptions:
https://docs.pytest.org/en/latest/assert.html#assertions-about-expected-exceptions

Up to you.

Cool! Incorporated this and looks a bit cleaner.

mihaic · 2019-03-19T18:36:56Z

Not sure what's up with Jenkins. Please feel free to ask it to test again whenever the error doesn't make sense. (See my comment from yesterday for the exact phrasing.)

snastase · 2019-03-19T18:38:14Z

Jenkins, retest this please.

mihaic · 2019-03-19T18:55:52Z

I hope you're not waiting for me for this PR. I could go through it again to leave more code-related comments, but I think it is more important to reach an agreement on the method changes and for that we should hear from @qihongl.

snastase · 2019-03-19T21:25:48Z

@qihongl, @manojneuro, and I had some discussions about this offline as well. I wanted to summarize the changes here because this sort of sprawled beyond the original scope of the PR. First, we can now accommodate NaNs when averaging N–1 time series in the leave-one-out approach, and can support an arbitrary proportion of NaNs to accommodate (masking out any voxels that exceed this threshold; see #398, #400, #404). Second, and importantly, the output of (symmetric) ISFC analysis can be represented as either a 3-dimensional array of redundant square matrices with shape n_subjects (or n_pairs) by n_voxels by n_voxels or a tuple containing the condensed triangles of the ISFC matrices and the diagonals (see #399). The goal of the latter arrangement is to keep the ISFC outputs 2-dimensional so they can easily be passed into statistical tests. We also provide the isc.squareform_isfc function to easily switch between the two (redundant square and condensed vector) representations. Third, the isfc function can now take a separate targets array as input (see #409). In this approach, the ISFC computation is not among voxels in the standard input data, but between each voxel in the input and each voxel (or ROI, or whatever) in the targets array, which could, for example, be a separate set of target ROIs. This typically results in non-square, asymmetric ISFC matrices which are output in the square (not condensed) format; this is only implemented for the leave-one-out approach. Finally, we remove/rearrange some duplicate functionality, namely phase_randomize and p_from_null in utils (see #396, #397). All of these modifications have defaults that shouldn't disrupt preexisting usage; except the ISFC output shape. Currently, my thought is to return the ISFCs in the condensed format, but I could be convinced otherwise. I can update any issues this might introduce in tutorials.

qihongl · 2019-03-25T18:09:40Z

Awesome! I will read the new code this week!

qihongl · 2019-03-27T16:10:52Z

I think all of these changes look great! @snastase @mihaic

mihaic

@snastase, could you please add a news item? It should at least cover the changed behavior of p_from_null as discussed in issue #397. See the Pip docs for what a new item looks like:
https://pip.pypa.io/en/latest/development/contributing/#news-entries
and a historical example:
b0a34e7#diff-a53c26f4c374db1512fc0b35896f642f

I realized we forgot to announce the major ISC changes in PR #384 which we released in v0.8. I think it would be good to add a news item for that PR as well now, directly in the NEWS file.

snastase · 2019-03-27T22:40:27Z

@mihaic Is this reasonable for news? I also ran through @manojneuro's tutorials and make a couple minor fixes to ensure that rolling this out won't break things.

mihaic · 2019-03-27T22:41:18Z

It looks great! I'll merge it as soon as testing is done.

mihaic · 2019-03-27T22:48:51Z

Jenkins, retest this please.

It will be added to the next release where it belongs, not v0.8. In PR #405, I mistakenly suggested editing NEWS directly.

snastase added 9 commits January 10, 2019 16:09

Tolerance for NaNs for ISC/ISFC, and tests

ed77dce

Messing with NaN options in isfc

1106224

Option to return NaNs in compute_correlation

d21773a

Moved NaN handling into _normalize_for_correlation

1d1fc0c

Merge branch 'fcma-nans' into isc-nans

c632206

NaN handling for ISFC and tests

6a12190

Reshaped ISFC output to n_subjects x n_voxel_pairs

966c182

Fixed whitespace errors etc

02d9729

tolerate_nans in phaseshift_isc and timeshift_isc tests

fcaf5d5

snastase requested review from manojneuro, mihaic and qihongl January 13, 2019 00:16

Merge branch 'master' into isc-nans

46c51d3

Added new function to squareform ISFCs and keep diagonal

6b943f5

snastase force-pushed the isc-nans branch from 0410e5f to 6b943f5 Compare January 15, 2019 19:37

qihongl reviewed Jan 27, 2019

View reviewed changes

snastase added 4 commits February 27, 2019 19:01

Replaced old p_from_null with new version

daa5d41

Removed ISC test NIfTI data because we're simulating

69f2d47

Moved phase randomization in phaseshift_isc to utils

dba670a

Removed vestigial ecdf from utils and tests

aa0c37d

This was referenced Mar 3, 2019

utils: Document difference between compute_p_from_null_distribution and p_from_null #397

Closed

isc: phaseshift_isc overlaps with brainiak.utils.phase_randomize #396

Closed

snastase added 3 commits March 14, 2019 13:43

isfc can now take 'targets' as input (asymmetric)

7b4e709

Merge branch 'master' into isc-nans

57a9d09

Fixed up 'targets' for ISFCs, added tests

90117b6

Increasing some bits of code coverage

e456419

mihaic reviewed Mar 18, 2019

View reviewed changes

mihaic mentioned this pull request Mar 18, 2019

dev: Add coverage threshold for patch #415

Merged

Modified exception tests to use shmancy pytest.raises

d71ae14

qihongl approved these changes Mar 27, 2019

View reviewed changes

mihaic requested changes Mar 27, 2019

View reviewed changes

snastase added 2 commits March 27, 2019 18:27

Added news items for issues and PR

4cc6417

Updated NEWS directly for outdated PR brainiak#384

357376b

snastase removed the request for review from manojneuro March 27, 2019 23:41

mihaic approved these changes Mar 27, 2019

View reviewed changes

snastase merged commit c2df4a1 into brainiak:master Mar 27, 2019

snastase deleted the isc-nans branch March 27, 2019 23:50

mihaic mentioned this pull request Apr 8, 2019

docs: Move PR #384 NEWS entry to newsfragments #417

Merged

mihaic added a commit that referenced this pull request Apr 10, 2019

docs: Move PR #384 NEWS entry to newsfragments (#417)

24040c3

It will be added to the next release where it belongs, not v0.8. In PR #405, I mistakenly suggested editing NEWS directly.


		else:
		logger.info("ISC computation will not tolerate NaNs when averaging")

Conversation

snastase commented Jan 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snastase commented Jan 15, 2019

Uh oh!

snastase commented Jan 15, 2019

Uh oh!

mihaic commented Jan 15, 2019

Uh oh!

snastase commented Jan 21, 2019

Uh oh!

qihongl commented Jan 22, 2019

Uh oh!

qihongl Jan 27, 2019

Choose a reason for hiding this comment

Uh oh!

qihongl commented Jan 27, 2019

Uh oh!

qihongl commented Jan 27, 2019

Uh oh!

snastase commented Mar 3, 2019

Uh oh!

mihaic commented Mar 7, 2019

Uh oh!

snastase commented Mar 18, 2019

Uh oh!

mihaic commented Mar 18, 2019

Uh oh!

mihaic commented Mar 18, 2019

Uh oh!

mihaic Mar 18, 2019

Choose a reason for hiding this comment

Uh oh!

snastase Mar 19, 2019

Choose a reason for hiding this comment

Uh oh!

mihaic commented Mar 19, 2019

Uh oh!

snastase commented Mar 19, 2019

Uh oh!

mihaic commented Mar 19, 2019

Uh oh!

snastase commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qihongl commented Mar 25, 2019

Uh oh!

qihongl commented Mar 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihaic left a comment

Choose a reason for hiding this comment

Uh oh!

snastase commented Mar 27, 2019

Uh oh!

mihaic commented Mar 27, 2019

Uh oh!

mihaic commented Mar 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

snastase commented Jan 12, 2019 •

edited

Loading

snastase commented Mar 19, 2019 •

edited

Loading

qihongl commented Mar 27, 2019 •

edited

Loading