Spoken Language to Sign Language Translation

This is a pipeline for converting English text into American Sign Language video (ASL). It also could serve as a framework for translating spoken to sign language.

Currently, this repo contains 3 parts:

lang2gloss: This converts English language text to ASL gloss
gloss2pose: This maps ASL gloss to their corresponding pose video segments
pose2sign: This translate pose videos into a human signing ASL

There is still much work to be done in this project, and more documentation and functionality will be added incrementally. For for information on this project, please check out these slides

Requisites

This repo requires

Ubuntu 18.04
python 3.6.8
CUDA 10
tensorflow 1.14
pytorch 1.1.0

You will also need an AWS account to create an s3 bucket, which stores processed data used for training

Dependencies

make deps

Note that this install Openpose, which may take more than 30 minutes.

Installation

make install

Configs

You will need to update variables in the top of the Makefile and the scripts/lang2sign file. Namely, S3_BUCKET and AWS_DEFAULT_REGION

Test

make test

Currently, this just does python linting

Run Inference

Pretrained models

Lang2Gloss

You can download the archived and compressed (.tar.gz file) pretrained transformer checkpoint (trained for 100000 steps) from google drive. You'll have to extract .tar.gz file.

Pose2Sign

You can download an archived and compressed (.tar.gz file) pretrained pix2pixHD model from google drive.

Setup

Make sure you followed the steps in the Dependencies, Installation, and Configs sections
Put your pretrained lang2gloss transformer model checkpoint in a subdirectory models/lang2gloss-transformer/. (You'll need to put the .index, .meta, and .data files in this directory, all suffixed as model.ckpt.
Put your video-metadata.csv file in data/raw/gloss2pose/video-metadata.csv. You can download a premade one from google drive.
Put your lookup files in your s3 bucket under gloss2pose/lookup/. You can download an archived and compressed (.tar.gz file) premade lookup from google drive. Your pose lookup video files should have this structure in s3:

gloss2/pose/lookup/
    pose-1.mov
    pose-2.mov
    pose-3.mov
        .
        .
        .

Make sure to get pretrained-embeddings

make data pretrained-embeddings preprocess

Clone my fork of pix2pixHD repo into this directory

git clone https://github.com/monkeyhippies/pix2pixHD.git

Put your pretrained pix2pixHD models into pix2pixHD/pix2pixHD/checkpoints/pose2sign/. This should be 2 .pth files, one for the generator and one for the discriminator. You can download an archived and compressed (.tar.gz file) pretrained model from google drive.

Run

Example:

scripts/lang2sign "Tomorrow I will go to the library"

Train

Lang2gloss

To train the lang2gloss transformer with pretrained gloVe embeddings, run

make train-lang2sign

Note that you'll first have to download and preprocess the training data, which can be done with these commands below

make deps install data pretrained-embeddings preprocess

Pose2Sign

Clone my fork of pix2pixHD.

git clone https://github.com/monkeyhippies/pix2pixHD.git

Put your preprocessed training data in the pix2pixHD repo under the subdirectory datasets/pose2sign/. Your data should look like this:

pix2pixHD/datasets/pose2sign/
    train_A/
        segment-1-0001.jpg
        segment-1-0002.jpg
            .
            .
            .
    train_B/
        pose-1-0001.jpg
        pose-1-0002.jpg
            .
            .
            .
    test_A/
        segment-101-0001.jpg
        segment-101-0002.jpg
            .
            .
            .
    test_B/
        pose-101-0001.jpg
        pose-101-0002.jpg
            .
            .
            .

Within the pix2pixHD repo, run the following command to train:

python3 train.py --name pose2sign --dataroot /home/ubuntu/pix2pixHD/datasets/pose2sign/ --label_nc 0 --no_instance --resize_or_crop None

You can download a pre-processed train and test dataset from google drive Training on the pre-processed dataset for 11 epochs, which produces reasonable results, will take around 10 days on a single tesla k80 gpu.

Pose Lookup Creation

If you would like to create the pose lookup from scratch:

make create-video-metadata create-video-lookup

You will be prompted to provide AWS keys with s3 permissions to store the lookup. Make sure you've already finished the steps in Configs section of this README before running the above command. Also note that processing everything required ~50hrs on a (Tesla k80) GPU.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
configs/open-nmt		configs/open-nmt
deployment-scripts		deployment-scripts
example_notebooks		example_notebooks
lang2sign		lang2sign
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spoken Language to Sign Language Translation

Requisites

Dependencies

Installation

Configs

Test

Run Inference

Pretrained models

Lang2Gloss

Pose2Sign

Setup

Run

Train

Lang2gloss

Pose2Sign

Pose Lookup Creation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spoken Language to Sign Language Translation

Requisites

Dependencies

Installation

Configs

Test

Run Inference

Pretrained models

Lang2Gloss

Pose2Sign

Setup

Run

Train

Lang2gloss

Pose2Sign

Pose Lookup Creation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages