Code for decoding speech as text from neural data
This package contains Python code for the high-level aspects of decoding speech from neural data, including transfer learning across multiple subjects. It was used for all results in the paper "Machine translation of cortical activity to text with an encoder-decoder framework" (Makin et al., Nature Neuroscience, 2020). These high-level aspects include the structuring of the training, the organization by subjects, and the construction of TFRecords. The (low-level) training itself is done with the adjacent machine_learning package, which implements sequence-to-sequence networks in TensorFlow.
-
Install TensorFlow 1.15.5, the final version of TF1.x.
pip install tensorflow-gpu==1.15.5If you don't have a GPU you should install the CPU version
pip install tensorflow==1.15.5Please consult the Tensorflow installation documents. The most important facts to know are that TF1.15 requires CUDA 10.0,
libcudnn7>=7.6.5.32-1+cuda10.0, andlibnccl2>=2.6.4-1+cuda10.0. (I have only tested with up to, not beyond, the listed versions of these libraries). Make sure the driver for your GPU is compatible with these versions of the cudNN and NCCL libraries. And the latest version of Python supported by TF1.15 is 3.7. -
Install the three required packages:
git clone https://github.com/jgmakin/utils_jgm.git pip install -e utils_jgm git clone https://github.com/jgmakin/machine_learning.git pip install -e machine_learning git clone https://github.com/jgmakin/ecog2txt.git pip install -e ecog2txt
Note that utils_jgm requires the user to set up a configuration file; please see the README for that package.
In order to unify the vast set of parameters (paths, experimental block structure, neural-network hyperparameters, etc.), all experiments are organized with the help of two configuration files, block_breakdowns.json, and YOUR_EXPERIMENT_manifest.yaml, examples of each are included in this repository.
-
Edit the
block_breakdowns.jsonto match your use case. The entries areSUBJECT_ID: {BLOCK: {"type: BLOCK_TYPE, "default_dataset": DEFAULT_DATASET_VALUE}}where the
DEFAULT_DATASET_VALUEis one of"training"/"validation"/"testing"; and theBLOCK_TYPEis whatever descriptive title you want to give to your block (e.g.,"mocha-3"). Assigning types to the blocks allows them to be filtered out of datasets, according to information provided in themanifest(see next item). Place your edited copy into a directory we will calljson_dir. -
Edit one of the
.yamlmanifest files to something sensible for your case. The most important thing to know is that many of the classes in this package (andmachine_learning) load their default attributes from thismanifest. That means that, even though the keyword arguments for their constructors (__init__()methods) may appear to default toNone, thisNoneactually instructs the class to default to the argument's value in themanifest.You don't have to set all the values before your first run, but in the very least, you should:
- Fix the paths/dirs. For the most part they are for writing, not reading, so you can set them wherever you like. For the three reading paths:
json_dirmust point to the location of yourblock_breakdowns.jsonfile (see previous item).bad_electrodes_pathmust point to a (possibly empty) plain-text file listing (one entry per line) any bad channels. NB that these are assumed to be 1-indexed! (but will internally be converted to zero-indexing). Alternatively, you can provide (either via the manifest or as an argument to theECoGDataGenerator) thegood_electrodesdirectly.electrode_path: you can ignore this unless you plan to plot results on the cortical surface (in which case contact me).
block_types: these set necessary conditions for membership in one of the datasets,training/validation/testing. For example, in themochastar_word_sequence.yamlmanifest file, thetestingandvalidationsets are allowed to include onlymocha-1, but the training set is allowed to includemocha-1, ..., mocha-9. So if amocha-3block hasvalidationas its"default_dataset"in theblock_breakdowns.json, it would be excluded altogether.grid_size: Set this to match the dimensions of your ECoG grid.text_sequence_vocab_file: You can provide a file with a list, one word per line, of all words to be targeted by the decoder. This key specifies just the name of the file; the file itself must live in thetext_dirspecified in__init__.py. If you set this key toNone, the package will attempt to build a list of unique targets directly from theTFRecords. An example vocab_file,vocab.mocha-timit.1806, is included in this package.data_mapping: Use this to set which data to use as inputs and outputs for the sequence-to-sequence network--see_ecog_token_generatorbelow.DataGenerator: In themanifest, this points to theECoGDataGeneratorindata_generators.py, but you will probably want to subclass this class and point to your new (sub)class instead--see next item.
You can probably get away with leaving the rest of the values in the
.yamlat their default values, at least for your first run.Finally, make sure
YOUR_EXPERIMENT_manifest.yamllives at thetext_dirspecified in__init__.py(you can change this as you like, but remember that thetext_sequence_vocab_filemust live in the same directory). - Fix the paths/dirs. For the most part they are for writing, not reading, so you can set them wherever you like. For the three reading paths:
-
ECoGDataGenerator, found indata_generators.py, is a shell class for generating data--more particularly for writing out theTFRecordsthat will be used for training and assessing your model--that plays nicely with the other classes. However, three of its (required!) methods are unspecified because they depend on how you store your data. (Dummy versions appear inECoGDataGenerator; you can inspect their input and outputs there.) You should subclassECoGDataGeneratorand fill in these methods:-
_ecog_token_generator: a Python generator that yields data structures in the form of adict, each entry of which corresponds to a set of inputs and outputs on a single trial. For example, the entries might beecog_sequence,text_sequence,audio_sequence, andphoneme_sequence. The last two are not strictly necessary for speech decoding and can be left out--or you can add more. Just make sure that you return at least the data structures requested in thedata_mappingspecified in themanifest. So e.g. if thedata_mappingisdata_mapping = {'decoder_targets': 'text_sequence', 'encoder_inputs': 'ecog_sequence'}then_ecog_token_generatormust yield dictionaries containing at least (but not limited to) atext_sequenceand anecog_sequence. The entire dictionary will be written to aTFRecord(one for each block), so it's better to yield more rather than fewer data structures, in case you change your mind later about thedata_mappingbut don't want to have to rewrite all theTFRecords.And one more thing: the
text_sequence_vocab_filekey in the experiment manifest is linked to thetext_sequencein this data mapping. So if you plan to call yourdecoder_targetssomething else, saymy_words, then make sure to rename the key in the experiment manifest that points to a vocab file tomy_words_vocab_file. -
_get_wav_data: should return thesampling_rateand audiosignalfor one (e.g.) block of audio data. This will allow you to make use of the built-in_get_MFCC_featuresin constructing your_ecog_token_generator. If you're never going to generate anaudio_sequence, however, you can ignore it. -
_query: should return the total number of examples in a group of blocks. This will allow you to allocate memory efficiently when using thegetmethod. However, the methods_queryandgetare not used elsewhere in the code; they are convenience functions for examining the data directly rather than through aTFRecord.
-
The basic commands to train a model are as follows (you can e.g. run this in a Python notebook).
import ecog2txt.trainers as e2t_trainers
import ecog2txt.data_generators
# CREATE A NEW MODEL
trainer = e2t_trainers.MultiSubjectTrainer(
experiment_manifest_name=YOUR_EXPERIMENT_manifest.yaml,
subject_ids=[400, 401],
SN_kwargs={
'FF_dropout': 0.4, # overwriting whatever is in the manifest
'TEMPORALLY_CONVOLVE': True # overwriting whatever is in the manifest
},
DG_kwargs={
'REFERENCE_BIPOLAR': True, # overwriting whatever is in the manifest
},
ES_kwargs = {
'data_mapping': { # overwriting whatever is in the manifest
'encoder_inputs': 'ecog_sequence',
'decoder_targets': 'text_sequence',
},
},
)
# MAKE SURE ALL THE TFRECORDS ARE WRITTEN
for subject in trainer.ecog_subjects:
subject.write_tf_records_maybe()
trainer.subject_to_table()
# TRAIN THE TWO SUBJECTS IN PARALLEL
assessments = trainer.parallel_transfer_learn()