-
Notifications
You must be signed in to change notification settings - Fork 141
Handling NaNs in ISC/ISFC (and ISFC output shape) #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I had to backpedal and rethink this. Simple |
|
@mihaic Any idea what's going on with this gcc mpich dependency error in the Travis Xcode 7.3 Mac build? |
|
@snastase, that looks like a glitch. You should be able to restart that particular job yourself on the Travis website. Can you please try? |
|
@qihongl @manojneuro Can I get another pair of eyes on this? Thanks! |
|
|
||
| else: | ||
| logger.info("ISC computation will not tolerate NaNs when averaging") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is better to explicitly do if tolerate_nans is False: here, and return something like "invalid input" for the else: block (e.g. if tolerate_nans is a string)?
|
btw, I was wondering if it is more intuitive to set |
|
I just went through the code once. Frankly, I don't have a lot of comments. I will go over everything again later w/ your tutorial later. |
|
@snastase, no sure. I'll try again. |
|
Jenkins, retest this please. |
tests/utils/test_utils.py
Outdated
| try: | ||
| (data_array_4d, _, _, _) = _check_timeseries_input(array_4d) | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytest has a nice way to check exceptions:
https://docs.pytest.org/en/latest/assert.html#assertions-about-expected-exceptions
Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Incorporated this and looks a bit cleaner.
|
Not sure what's up with Jenkins. Please feel free to ask it to test again whenever the error doesn't make sense. (See my comment from yesterday for the exact phrasing.) |
|
Jenkins, retest this please. |
|
I hope you're not waiting for me for this PR. I could go through it again to leave more code-related comments, but I think it is more important to reach an agreement on the method changes and for that we should hear from @qihongl. |
|
@qihongl, @manojneuro, and I had some discussions about this offline as well. I wanted to summarize the changes here because this sort of sprawled beyond the original scope of the PR. First, we can now accommodate NaNs when averaging N–1 time series in the leave-one-out approach, and can support an arbitrary proportion of NaNs to accommodate (masking out any voxels that exceed this threshold; see #398, #400, #404). Second, and importantly, the output of (symmetric) ISFC analysis can be represented as either a 3-dimensional array of redundant square matrices with shape |
|
Awesome! I will read the new code this week! |
mihaic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snastase, could you please add a news item? It should at least cover the changed behavior of p_from_null as discussed in issue #397. See the Pip docs for what a new item looks like:
https://pip.pypa.io/en/latest/development/contributing/#news-entries
and a historical example:
b0a34e7#diff-a53c26f4c374db1512fc0b35896f642f
I realized we forgot to announce the major ISC changes in PR #384 which we released in v0.8. I think it would be good to add a news item for that PR as well now, directly in the NEWS file.
|
@mihaic Is this reasonable for news? I also ran through @manojneuro's tutorials and make a couple minor fixes to ensure that rolling this out won't break things. |
|
It looks great! I'll merge it as soon as testing is done. |
|
Jenkins, retest this please. |
It will be added to the next release where it belongs, not v0.8. In PR #405, I mistakenly suggested editing NEWS directly.
I've added a
tolerate_nansoption to theiscandisfcfunctions to better handle NaNs, particularly when averaging time series for N–1 subjects in the leave-one-out approach. NaNs are typically introduced prior to ISC analyses by z-scoring voxels with zero variance (due to susceptibility, limited EPI FoV; see issue #398). The default value isTrue, which accommodates (i.e., ignores) any NaNs when computing the average of N–1 subject time series by usingnp.nanmean. If the user wants to handle NaNs beforehand, they can settolerate_nans=False, which will return a NaN even if there's a single NaN in the time series for any single subject—but this usesnp.meanrather thannp.nanmean, which I believe is faster. Alternatively, the user can supply a float between 0.0 and 1.0 indicating the threshold proportion of subjects with non-NaN values needed to keep the voxel. For example, if the threshold is 0.8, any voxel with >= 80% subjects with non-NaN values will be kept and the average of N–1 subject time series will be computed to tolerate the remaining NaNs; if a voxel has < 80% subjects with non-NaN values (i.e., > 20% of subjects have NaNs), the voxel is excluded from the computation. In all cases, any voxel with a NaN in the time series across all subjects is removed prior to any computation, as this will always result in NaN in the final output (to reduce compute time). If voxels with NaNs are censored out, the resulting output will still match the original number of voxels with NaN values padded back in appropriately. I'm arguing to leave the default value asTruerather than some particular proportion, because I'm not sure any specific proportion is actually reported consistently in the literature. I had to add an option to return NaNs inbrainiak.fcma.util._normalize_for_correlationto make sure theisfcfunction (which usesfcma.util.compute_correlationfor speed) returns NaNs consistent withisc(and other Python correlation functions; see issue #400). This PR relies on a merge of PR #404. Finally, I added avectorize_isfcsoption toisfc, which by default (True) vectorizes the upper triangle of the ISFC matrices, yielding a 2Dn_subjects(orn_pairs) byn_connections(pairs of voxels) array; ifvectorize_isfcs=False, we keep the square (redundant) ISFC matrix and output is shapedn_subjects(orn_pairs) byn_voxelsbyn_voxels. This is more consistent with the shape of theiscoutput, and is more compatible with the subsequent statistical tests.