Skip to content

Conversation

@snastase
Copy link
Contributor

@snastase snastase commented Jan 12, 2019

I've added a tolerate_nans option to the isc and isfc functions to better handle NaNs, particularly when averaging time series for N–1 subjects in the leave-one-out approach. NaNs are typically introduced prior to ISC analyses by z-scoring voxels with zero variance (due to susceptibility, limited EPI FoV; see issue #398). The default value is True, which accommodates (i.e., ignores) any NaNs when computing the average of N–1 subject time series by using np.nanmean. If the user wants to handle NaNs beforehand, they can set tolerate_nans=False, which will return a NaN even if there's a single NaN in the time series for any single subject—but this uses np.mean rather than np.nanmean, which I believe is faster. Alternatively, the user can supply a float between 0.0 and 1.0 indicating the threshold proportion of subjects with non-NaN values needed to keep the voxel. For example, if the threshold is 0.8, any voxel with >= 80% subjects with non-NaN values will be kept and the average of N–1 subject time series will be computed to tolerate the remaining NaNs; if a voxel has < 80% subjects with non-NaN values (i.e., > 20% of subjects have NaNs), the voxel is excluded from the computation. In all cases, any voxel with a NaN in the time series across all subjects is removed prior to any computation, as this will always result in NaN in the final output (to reduce compute time). If voxels with NaNs are censored out, the resulting output will still match the original number of voxels with NaN values padded back in appropriately. I'm arguing to leave the default value as True rather than some particular proportion, because I'm not sure any specific proportion is actually reported consistently in the literature. I had to add an option to return NaNs in brainiak.fcma.util._normalize_for_correlation to make sure the isfc function (which uses fcma.util.compute_correlation for speed) returns NaNs consistent with isc (and other Python correlation functions; see issue #400). This PR relies on a merge of PR #404. Finally, I added a vectorize_isfcs option to isfc, which by default (True) vectorizes the upper triangle of the ISFC matrices, yielding a 2D n_subjects (or n_pairs) by n_connections (pairs of voxels) array; if vectorize_isfcs=False, we keep the square (redundant) ISFC matrix and output is shaped n_subjects (or n_pairs) by n_voxels by n_voxels. This is more consistent with the shape of the isc output, and is more compatible with the subsequent statistical tests.

@snastase
Copy link
Contributor Author

I had to backpedal and rethink this. Simple scipy.spatial.distance.squareform discards the diagonal of the square ISFC matrix, which in our case is meaningful (the ISC values). I wrote a squareform_isfc function that mimics squareform functionality but retains the diagonal ISC values. So if you pass a n_subjects by n_voxels by n_voxels array of ISFCs to squareform_isfc, it returns a tuple comprising the condensed off-diagonal ISFC values and the diagonal ISC values. Alternatively, if you pass in the condensed off-diagonal ISFCs and the ISCs, it gives you back the original 3D array by squareforming the ISFCs and populating the diagonal with ISCs. If you set vectorize_isfcs=True (the default) in isfc, it calls squareform_isfc internally and returns this tuple of ISFCs and ISCs, both of which can be passed directly to the bootstrap_isc and permutation_isc statistical tests. I also took @qihongl's comment from PR #384 and now collapse first dimension when singleton. Hope it smooths things out a bit—open to suggestions on this!

@snastase
Copy link
Contributor Author

@mihaic Any idea what's going on with this gcc mpich dependency error in the Travis Xcode 7.3 Mac build?

@mihaic
Copy link
Member

mihaic commented Jan 15, 2019

@snastase, that looks like a glitch. You should be able to restart that particular job yourself on the Travis website. Can you please try?

@snastase
Copy link
Contributor Author

@qihongl @manojneuro Can I get another pair of eyes on this? Thanks!

@qihongl
Copy link
Member

qihongl commented Jan 22, 2019

@mihaic @snastase I'd love to take a look on this! Sorry about the delayed response. I'm a little over overwhelmed recently XD


else:
logger.info("ISC computation will not tolerate NaNs when averaging")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is better to explicitly do if tolerate_nans is False: here, and return something like "invalid input" for the else: block (e.g. if tolerate_nans is a string)?

@qihongl
Copy link
Member

qihongl commented Jan 27, 2019

btw, I was wondering if it is more intuitive to set vectorize_isfcs to False by default? The redundant matrix form ISFC values seem to be easier to understand / friendlier to new users?
Are there some good reasons to use the vectorized representation by default?

@qihongl
Copy link
Member

qihongl commented Jan 27, 2019

I just went through the code once. Frankly, I don't have a lot of comments. I will go over everything again later w/ your tutorial later.
Thanks for the enhancement! These new features look awesome!

@snastase
Copy link
Contributor Author

snastase commented Mar 3, 2019

Newest commits should resolve issues #396 and #397.

@mihaic
Copy link
Member

mihaic commented Mar 7, 2019

@snastase, we need to review and merge PR #414 for the Jenkins Linux test to pass.

@snastase
Copy link
Contributor Author

Hey @mihaic—do you know what's breaking the Linux build? I thought I had successfully merged in the #414 fix at 57a9d09. Any ideas?

@mihaic
Copy link
Member

mihaic commented Mar 18, 2019

@snastase, no sure. I'll try again.

@mihaic
Copy link
Member

mihaic commented Mar 18, 2019

Jenkins, retest this please.

try:
(data_array_4d, _, _, _) = _check_timeseries_input(array_4d)
except ValueError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytest has a nice way to check exceptions:
https://docs.pytest.org/en/latest/assert.html#assertions-about-expected-exceptions

Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Incorporated this and looks a bit cleaner.

@mihaic
Copy link
Member

mihaic commented Mar 19, 2019

Not sure what's up with Jenkins. Please feel free to ask it to test again whenever the error doesn't make sense. (See my comment from yesterday for the exact phrasing.)

@snastase
Copy link
Contributor Author

Jenkins, retest this please.

@mihaic
Copy link
Member

mihaic commented Mar 19, 2019

I hope you're not waiting for me for this PR. I could go through it again to leave more code-related comments, but I think it is more important to reach an agreement on the method changes and for that we should hear from @qihongl.

@snastase
Copy link
Contributor Author

snastase commented Mar 19, 2019

@qihongl, @manojneuro, and I had some discussions about this offline as well. I wanted to summarize the changes here because this sort of sprawled beyond the original scope of the PR. First, we can now accommodate NaNs when averaging N–1 time series in the leave-one-out approach, and can support an arbitrary proportion of NaNs to accommodate (masking out any voxels that exceed this threshold; see #398, #400, #404). Second, and importantly, the output of (symmetric) ISFC analysis can be represented as either a 3-dimensional array of redundant square matrices with shape n_subjects (or n_pairs) by n_voxels by n_voxels or a tuple containing the condensed triangles of the ISFC matrices and the diagonals (see #399). The goal of the latter arrangement is to keep the ISFC outputs 2-dimensional so they can easily be passed into statistical tests. We also provide the isc.squareform_isfc function to easily switch between the two (redundant square and condensed vector) representations. Third, the isfc function can now take a separate targets array as input (see #409). In this approach, the ISFC computation is not among voxels in the standard input data, but between each voxel in the input and each voxel (or ROI, or whatever) in the targets array, which could, for example, be a separate set of target ROIs. This typically results in non-square, asymmetric ISFC matrices which are output in the square (not condensed) format; this is only implemented for the leave-one-out approach. Finally, we remove/rearrange some duplicate functionality, namely phase_randomize and p_from_null in utils (see #396, #397). All of these modifications have defaults that shouldn't disrupt preexisting usage; except the ISFC output shape. Currently, my thought is to return the ISFCs in the condensed format, but I could be convinced otherwise. I can update any issues this might introduce in tutorials.

@qihongl
Copy link
Member

qihongl commented Mar 25, 2019

Awesome! I will read the new code this week!

@qihongl
Copy link
Member

qihongl commented Mar 27, 2019

I think all of these changes look great! @snastase @mihaic

Copy link
Member

@mihaic mihaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snastase, could you please add a news item? It should at least cover the changed behavior of p_from_null as discussed in issue #397. See the Pip docs for what a new item looks like:
https://pip.pypa.io/en/latest/development/contributing/#news-entries
and a historical example:
b0a34e7#diff-a53c26f4c374db1512fc0b35896f642f

I realized we forgot to announce the major ISC changes in PR #384 which we released in v0.8. I think it would be good to add a news item for that PR as well now, directly in the NEWS file.

@snastase
Copy link
Contributor Author

@mihaic Is this reasonable for news? I also ran through @manojneuro's tutorials and make a couple minor fixes to ensure that rolling this out won't break things.

@mihaic
Copy link
Member

mihaic commented Mar 27, 2019

It looks great! I'll merge it as soon as testing is done.

@mihaic
Copy link
Member

mihaic commented Mar 27, 2019

Jenkins, retest this please.

@snastase snastase removed the request for review from manojneuro March 27, 2019 23:41
@snastase snastase merged commit c2df4a1 into brainiak:master Mar 27, 2019
@snastase snastase deleted the isc-nans branch March 27, 2019 23:50
mihaic added a commit that referenced this pull request Apr 10, 2019
It will be added to the next release where it belongs, not v0.8.
In PR #405, I mistakenly suggested editing NEWS directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants