Skip to content

Moved the RT fmrisim generator and updated it to use resource stream#460

Merged
mihaic merged 37 commits into
brainiak:masterfrom
CameronTEllis:move_RT_fmrisim
May 5, 2020
Merged

Moved the RT fmrisim generator and updated it to use resource stream#460
mihaic merged 37 commits into
brainiak:masterfrom
CameronTEllis:move_RT_fmrisim

Conversation

@CameronTEllis

Copy link
Copy Markdown
Contributor

No description provided.

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

@mihaic Hi Mihai, I am running into issues with using tmp_path. I have tried to follow the protocol they describe in the pytest link you sent but can't find it. Can you advise how the tests/utils/test_fmrisim_real_time.py should be written to use tmp_path

Comment thread tests/utils/test_fmrisim_real_time.py Outdated
Comment thread tests/utils/test_fmrisim_real_time.py Outdated
Comment thread brainiak/utils/fmrisim_real_time_generator.py
Comment thread brainiak/utils/fmrisim_real_time_generator.py
@CameronTEllis

Copy link
Copy Markdown
Contributor Author

After many pep8 errors, I now have new errors I am not sure about. The first one throws an error because I do not have enough arguments, but that is precisely the test I am running that should be caught by the pytest command. Is that because it is not in a function? The other errors are peculiar. Is it something about how the dictionary data type is being altered by being supplied a resource stream?

@mihaic

mihaic commented Apr 8, 2020

Copy link
Copy Markdown
Member

Yes, testing for raised exceptions should be part of Pytest functions.

However, the current errors are raised by run-checks, not run-tests, so they are unrelated to Pytest. They are raised by the Mypy type checker.

Mypy assumes all the values in data_dict are the same type as the first value you set, which is the output of resource_stream:

data_dict['ROI_A_file'] = resource_stream(gen.__name__,
                                          "sim_parameters/ROI_A.nii.gz")

You can avoid it by explicitly specifying a type for for data_dict that accepts any value type:

from typing import Dict

data_dict: Dict = {}

You can disable type checking for any line by adding a # type: ignore comment. For example, for testing that the function call raises an exception:

    gen.generate_data()  # type: ignore

@mihaic

mihaic commented Apr 15, 2020

Copy link
Copy Markdown
Member

@CameronTEllis, in case you are baffled by the current error, note that nibabel.load can only deal with file paths. To work with the io.BufferedReader object returned by resource_stream, you can do something like this:

import gzip

roi_a_file = resource_stream(__name__, "ROI_A.nii.gz")
image = nibabel.nifti1.Nifti1Image.from_bytes(gzip.decompress(roi_a_file.read()))

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

@mihaic Hmm still strange errors. this one seems to be a version issue perhaps: from_bytes was added recently. This solution is also quite cumbersome, are there alternative solutions to find the path to something in the resource stream so that it can be treated as a path?

@mihaic

mihaic commented Apr 15, 2020

Copy link
Copy Markdown
Member

It is cumbersome, but you have implemented it already. :)

You can try the following for the import error:

from nibabel.nifti1 import Nifti1Image
_from_bytes = Nifti1Image.from_bytes

Note that I wrote _from_bytes because it is not public. Nevertheless, I recommend you use the more customary form Nifti1Image.from_bytes.

@mihaic

mihaic commented Apr 16, 2020

Copy link
Copy Markdown
Member

I recommend changing the signature of def generate_data, to explicitly include all the file path keys currently in data_dict, including noise_dict_file. That means incorporating the _get_input_names into generate_data. This will simplify the docstring as well (note that it is currently out of sync with the code and not formatted for Sphinx).

from typing import Any, Dict

def generate_data(
    output_dir: str,
    data_dict: Dict[str, Any],
    roi_a_file: str = None,
    roi_b_file: str = None,
    template_path: str = None,
    noise_dict_file: str = None,
) -> None:

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

Hmm I am not familiar with this style of signature. But separate from that, how does this solve the problem? If noise_dict_file is sometimes a resource stream, sometimes just a path to a file, how will it load this file as text? Moreover, if I set default values to 'None' then the code will crash without setting these values, whereas the current set up should run the default files unless specified otherwise, which is the correct behavior.

@mihaic

mihaic commented Apr 17, 2020

Copy link
Copy Markdown
Member

I should not have changed the style, sorry about that. (The style follows PEP 484, as mentioned in our contributing documentation.)

My suggestion is to move the code from _get_input_names into generate_data. You are already dealing in _get_input_names with the case when noise_dict_file is None. Also, callers of generate_data will only pass paths, not resource streams. Resource streams are only an internal mechanism to use the default files shipped with BrainIAK when the callers do not pass paths. Does that make sense?

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

Sorry I am still confused. The resource stream call I do to get the text file from _get_input_names fails because you can't use open() on a resource stream object.

The whole reason we got into this resource stream business is because I need a way to read the files that are in the sim_parameters folder to this function. It would obviously be easier if there was a way for this code to automatically find the path to the sim_parameters folder and just use that but we were told that wasn't possible with a conda install. Is that not the case?

Using these files should be the default behavior but users can specify other paths if they want to. Incorporating _get_input_names into generate_data is fine but the reason I separated them was because I was exceeding the complexity limits. Moreover, incorporating this won't solve the problem of reading in the text file as a resource stream.

@mihaic

mihaic commented Apr 17, 2020

Copy link
Copy Markdown
Member

Right, you cannot use open with resource_stream. In our earlier discussion we selected a way to load the file data into Nifti1Image. I was assuming you were going to apply a similar approach to reading the noise_dict_file file data. Sorry for not providing an example. Here is how you can do it:

text_in_noise_dict_file = resource_stream(__name__, "sub_noise_dict.txt").read().decode()

You can keep _get_input_names separate and put this code there to help with complexity.

Note that you should change the test code to pass None for all the files.

There are other ways to read data files shipped with Python packages, but I recommend we continue using resource_stream.

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

@mihaic Hi I assumed that the temp path that was created would be a string but it seems to be a PosixPath. Is there a way to convert that to a string or to append a suffix to the path to specify a certain file? From the documentation it seems like: tmp_path / $str could work

@mihaic

mihaic commented Apr 30, 2020

Copy link
Copy Markdown
Member

Indeed, tmp_path / "rt_007.npy" will work.

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

@mihaic is this saying there is an indent on line 11 of the docstring? If so, I don't see it: there are no characters on that line

@mihaic

mihaic commented May 1, 2020

Copy link
Copy Markdown
Member

First, congrats for getting Pytest to succeed!

The line numbers in docstring errors are misleading, because they refer to the docstring, not the file. We documented this issue some time ago, but unfortunately we still do not have a solution:
#42

In this case, the file line number is 11 + 3 (the start of the the docstring) = 14:

numTRs - Specify the number of time points

Remember that docstrings must follow the Sphinx format. In particular:

indentation is significant in reST, so all lines of the same paragraph must be left-aligned to the same level of indentation
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#paragraphs

Note that the text documenting the generate_data function is not a docstring, so it is not visible in Python, nor in the HTML documentation. Please make it a NumPy-like docstring as our contributing guideline mentions:
https://www.sphinx-doc.org/en/stable/usage/extensions/example_numpy.html

@CameronTEllis

Copy link
Copy Markdown
Contributor Author

@mihaic Suggestions on how to resolve the Python 3.5 vs 3.6 issue? Seems like a different solution is needed for this dictionary mem map issue

@mihaic

mihaic commented May 4, 2020

Copy link
Copy Markdown
Member

I forgot that type annotations for variables were only introduced in Python 3.6. Let's maintain compatibility with Python 3.5 for the moment. Please change to:

data_dict = {}  # type: Dict

@gdoubleyew gdoubleyew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cameron, looks good! Thanks for making all these default settings, that will make it very easy to get started using. Also, thanks for making all the test functions.

@mihaic mihaic merged commit 00da44e into brainiak:master May 5, 2020
@mihaic

mihaic commented May 5, 2020

Copy link
Copy Markdown
Member

Thanks, @CameronTEllis & @gdoubleyew.

Note that I changed from package_data to include_package_data in setup.py. In the future, we only need to add files to Git to have them show up in the installation.

The Conda error we were seeing was caused by sim_parameters not being a Python package, which was causing the inclusion of data files to fail silently. We would have needed to prefix each file with the sim_parameters directory. See:
https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants