This repository contains reproduction code for the analyses reported in Shain & Fedorenko (2025).
The entire analysis can only be fully reproduced by people with internal access to Fedorenko lab
files due to consent restrictions on the raw data. However, much of the code can still be run on
the publicly available parcellation data (https://openneuro.org/datasets/ds006071). This codebase
assumes that the parcellation data has been downloaded to the path ../../results/fMRI_parcellate
relative to the root of this repository.
Installation is just a matter of setting up an Anaconda environment with the right software dependencies. Once you have Anaconda installed, you can create the environment by running the following command in the terminal:
conda env create -f conda.ymlThis will create a new environment called parcellate with all the necessary dependencies,
which you can activate by running:
conda activate parcellateUsage simply consists of the following scripts, which can be run via the command line from
the root of this repository. Usage for individual scripts can be viewed by running them
with the --help flag.
The first script that must be run before all others sets the data paths interactively:
python -m langlocfc.set_data_pathNote that both this codebase and the parcellate config files assume that this repository,
the parcellate repository
(https://github.com/coryshain/parcellate), and
the parcellation results (downloaded from Open Neuro) are all siblings in a shared parent
directory. Deviations from this structure may require changes to code or configuration files.
The main analyses are initialized by a script (initialize.py) that crawls a lab-internal data
directory, locates relevant metadata, and compiles it into a single configuragion (*.yml
file) that can be read by the parcellate code for each scanning session in the dataset.
This script requires two private resources and can therefore only be run lab-internally:
access to the raw data and a table called evlab_experiments_2024-06-18.csv. Internal
users wishing to run this script should reach out to either author.
python -m langlocfc.initialize <ARGS>Our parcellation uses a slightly modified preprocessing configuration based on the default
configuration in the Fedorenko lab: pipeline_preproc_Parcellate.cfg. This file
can be used to run preprocessing via the
CONN toolbox. The
make_preprocessing_jobs.py generates SLURM
batch scripts to run processing for each session in parallel on a computer cluster. As with
initialize.py, this script requires access to the raw data and can only be run lab-internally.
python -m langlocfc.make_preprocessing_jobsSecondary analyses of the influence of preprocessing parameters on the final results can be
initialized by running initialize_search.py:
python -m langlocfc.initialize_searchSecondary analyses of the influence of different amounts of data on the final results can be
initialized by running initialize_multisession.py:
python -m langlocfc.initialize_multisessionA table of all dicoms used in these analyses can be generated by running get_dicoms.py.
This script requires access to the raw data and can only be run lab-internally.
python -m langlocfc.get_dicomsOracle analyses (labeling directly against localizer task activations) can be added to
any given set of parcellate config files by running add_oracle.py:
python -m langlocfc.add_oracle <CONFIG_PATHS>Some of the plots and figures in the paper (namely individual-participant brain visualizations
and bar charts across session) require command line scripts from the parcellate package
(https://github.com/coryshain/parcellate) to be
run first. To compute individual-participant brain maps, run the following from the
root of the parcellate repository:
python -m parcellate.plot <CONFIG_PATHS> -t atlas -A LANG LANA -e Lang_S-Nwhere <CONFIG_PATHS> is a list of paths to configuration files for each session you want to
include in the plot. The config paths for this project are located in the root of each
session-specific parcellation directory in our data release. For example, there are several
sessions in the ../results/fMRI_parcellate/derivatives/nolangloc directory. The config
files for all of these sessions can be obtained with the following bash syntax:
../results/fMRI_parcellate/derivatives/nolangloc/*/config.ymlTo compute plot data for across-sessions bar plots, run the following from the root of the
parcellate repository:
python -m parcellate.plot <CONFIG_PATHS> -o <OUTPUT_PATH> -Dand to compute line plots over parcellation granularities k, run the following from the root
of the parcellate repository:
python -m parcellate.plot <CONFIG_PATHS> -t grid -o <OUTPUT_PATH> -Dwhere <CONFIG_PATHS> is a list of paths to configuration files for each session you want to
include in the plot, and <OUTPUT_PATH> is the path to save the output data. For each parcellation
configuration directory, you should place the plots into subdirectory called plots. For example,
to generate plots for the main nolangloc setting, assuming the directory for these results is
at ../results/fMRI_parcellate/derivates/nolangloc relative to the parcellate repository,
the output path should be ../results/fMRI_parcellate/derivates/nolangloc/plots.
Once these plotting commands have been run, the main plots for the paper can be generated using various scripts in this directory.
The main plots (bar and line charts in the paper) across sessions are generated by
plot_performance.py:
python -m langlocfc.plot_performanceThe complete by-session brain plots can be generated by stitch_brains.py:
python -m langlocfc.stitch_brainsPlots of k-means samples used in the schematic figure in the paper can be generated by
plot_sample.py:
python -m langlocfc.plot_sample <ARGS>Plots of group-averaged LangFC networks can be generated by get_group_averages.py:
python -m langlocfc.get_group_averages <ARGS>The collection of most highly-stable brain plots (used to sample sessions for figures in the
main article) can be obtained by get_brain_plots.py:
python -m langlocfc.get_brain_plots <ARGS>Tables of metadata for a given batch of results can be obtained using get_metadata.py:
python -m langlocfc.get_metadata <ARGS>Plots of data efficiency (number of runs) in parcellation can be generated by plot_efficiency.py:
python -m langlocfc.plot_efficiencyPlots of results of searching across preprocessing parameters can be generated by plot_search.py:
python -m langlocfc.plot_searchPlots of results of the oracle analysis (labeling directly against localizer task
activations) can be generated by plot_oracle.py:
python -m langlocfc.plot_oracleAnalyses of LangFC stability within and between subjects can be generated by plot_stability.py:
python -m langlocfc.stability <ARGS>Analyses of the effect of task regression can be initialized by task_regression_init.py. This script
requires access to the raw data and can only be run lab-internally.
python -m langlocfc.task_regression_init <ARGS>This will produce a bunch of SLURM batch scripts to run the GLMs in parallel and produce task-regressed
timecourses (via multiple calls in parallel to task_regression.py. Once complete, these timecourses
can be parcellated using the config files in the parcellate_cfg subdirectory of the task_regression
directory. Once the parcellations are complete, the parcellations with and without task regression
can be compared using task_regression_comparison.py:
python -m langlocfc.task_regression_comparison <ARGS>Summary statistics from these analyses can be generated by compile_task_corr.py:
python -m langlocfc.compile_task_corrLangFC as identified by the LANG vs LanA atlases can be compared using lang_v_lana.py:
python -m langlocfc.lang_v_lana <ARGS>LangFC-based reanalyses of Shain et al., (2024, J Cog Neuro) can be generated by pdd.py:
python -m langlocfc.pddand plotted by plot_pdd.py:
python -m langlocfc.plot_pdd