This is a pipeline for converting English text into American Sign Language video (ASL). It also could serve as a framework for translating spoken to sign language.
Currently, this repo contains 3 parts:
- lang2gloss: This converts English language text to ASL gloss
- gloss2pose: This maps ASL gloss to their corresponding pose video segments
- pose2sign: This translate pose videos into a human signing ASL
There is still much work to be done in this project, and more documentation and functionality will be added incrementally. For for information on this project, please check out these slides
This repo requires
- Ubuntu 18.04
- python 3.6.8
- CUDA 10
- tensorflow 1.14
- pytorch 1.1.0
You will also need an AWS account to create an s3 bucket, which stores processed data used for training
make deps
Note that this install Openpose, which may take more than 30 minutes.
make install
You will need to update variables in the top of the Makefile and the scripts/lang2sign file. Namely, S3_BUCKET and AWS_DEFAULT_REGION
make test
Currently, this just does python linting
You can download the archived and compressed (.tar.gz file) pretrained transformer checkpoint (trained for 100000 steps) from google drive. You'll have to extract .tar.gz file.
You can download an archived and compressed (.tar.gz file) pretrained pix2pixHD model from google drive.
- Make sure you followed the steps in the Dependencies, Installation, and Configs sections
- Put your pretrained lang2gloss transformer model checkpoint in a subdirectory
models/lang2gloss-transformer/. (You'll need to put the.index,.meta, and.datafiles in this directory, all suffixed asmodel.ckpt. - Put your video-metadata.csv file in
data/raw/gloss2pose/video-metadata.csv. You can download a premade one from google drive. - Put your lookup files in your s3 bucket under
gloss2pose/lookup/. You can download an archived and compressed (.tar.gzfile) premade lookup from google drive. Your pose lookup video files should have this structure in s3:
gloss2/pose/lookup/
pose-1.mov
pose-2.mov
pose-3.mov
.
.
.
- Make sure to get pretrained-embeddings
make data pretrained-embeddings preprocess
- Clone my fork of pix2pixHD repo into this directory
git clone https://github.com/monkeyhippies/pix2pixHD.git
- Put your pretrained pix2pixHD models into
pix2pixHD/pix2pixHD/checkpoints/pose2sign/. This should be 2.pthfiles, one for the generator and one for the discriminator. You can download an archived and compressed (.tar.gzfile) pretrained model from google drive.
Example:
scripts/lang2sign "Tomorrow I will go to the library"
To train the lang2gloss transformer with pretrained gloVe embeddings, run
make train-lang2sign
Note that you'll first have to download and preprocess the training data, which can be done with these commands below
make deps install data pretrained-embeddings preprocess
- Clone my fork of pix2pixHD.
git clone https://github.com/monkeyhippies/pix2pixHD.git
- Put your preprocessed training data in the pix2pixHD repo under the subdirectory
datasets/pose2sign/. Your data should look like this:
pix2pixHD/datasets/pose2sign/
train_A/
segment-1-0001.jpg
segment-1-0002.jpg
.
.
.
train_B/
pose-1-0001.jpg
pose-1-0002.jpg
.
.
.
test_A/
segment-101-0001.jpg
segment-101-0002.jpg
.
.
.
test_B/
pose-101-0001.jpg
pose-101-0002.jpg
.
.
.
- Within the pix2pixHD repo, run the following command to train:
python3 train.py --name pose2sign --dataroot /home/ubuntu/pix2pixHD/datasets/pose2sign/ --label_nc 0 --no_instance --resize_or_crop None
You can download a pre-processed train and test dataset from google drive Training on the pre-processed dataset for 11 epochs, which produces reasonable results, will take around 10 days on a single tesla k80 gpu.
If you would like to create the pose lookup from scratch:
make create-video-metadata create-video-lookup
You will be prompted to provide AWS keys with s3 permissions to store the lookup. Make sure you've already finished the steps in Configs section of this README before running the above command. Also note that processing everything required ~50hrs on a (Tesla k80) GPU.