Skip to content

monkeyhippies/lang2sign

Repository files navigation

Spoken Language to Sign Language Translation

This is a pipeline for converting English text into American Sign Language video (ASL). It also could serve as a framework for translating spoken to sign language.

Currently, this repo contains 3 parts:

  • lang2gloss: This converts English language text to ASL gloss
  • gloss2pose: This maps ASL gloss to their corresponding pose video segments
  • pose2sign: This translate pose videos into a human signing ASL

There is still much work to be done in this project, and more documentation and functionality will be added incrementally. For for information on this project, please check out these slides

Requisites

This repo requires

  • Ubuntu 18.04
  • python 3.6.8
  • CUDA 10
  • tensorflow 1.14
  • pytorch 1.1.0

You will also need an AWS account to create an s3 bucket, which stores processed data used for training

Dependencies

make deps

Note that this install Openpose, which may take more than 30 minutes.

Installation

make install

Configs

You will need to update variables in the top of the Makefile and the scripts/lang2sign file. Namely, S3_BUCKET and AWS_DEFAULT_REGION

Test

make test

Currently, this just does python linting

Run Inference

Pretrained models

Lang2Gloss

You can download the archived and compressed (.tar.gz file) pretrained transformer checkpoint (trained for 100000 steps) from google drive. You'll have to extract .tar.gz file.

Pose2Sign

You can download an archived and compressed (.tar.gz file) pretrained pix2pixHD model from google drive.

Setup

  1. Make sure you followed the steps in the Dependencies, Installation, and Configs sections
  2. Put your pretrained lang2gloss transformer model checkpoint in a subdirectory models/lang2gloss-transformer/. (You'll need to put the .index, .meta, and .data files in this directory, all suffixed as model.ckpt.
  3. Put your video-metadata.csv file in data/raw/gloss2pose/video-metadata.csv. You can download a premade one from google drive.
  4. Put your lookup files in your s3 bucket under gloss2pose/lookup/. You can download an archived and compressed (.tar.gz file) premade lookup from google drive. Your pose lookup video files should have this structure in s3:
gloss2/pose/lookup/
    pose-1.mov
    pose-2.mov
    pose-3.mov
        .
        .
        .
  1. Make sure to get pretrained-embeddings
make data pretrained-embeddings preprocess
  1. Clone my fork of pix2pixHD repo into this directory
git clone https://github.com/monkeyhippies/pix2pixHD.git
  1. Put your pretrained pix2pixHD models into pix2pixHD/pix2pixHD/checkpoints/pose2sign/. This should be 2 .pth files, one for the generator and one for the discriminator. You can download an archived and compressed (.tar.gz file) pretrained model from google drive.

Run

Example:

scripts/lang2sign "Tomorrow I will go to the library"

Train

Lang2gloss

To train the lang2gloss transformer with pretrained gloVe embeddings, run

make train-lang2sign

Note that you'll first have to download and preprocess the training data, which can be done with these commands below

make deps install data pretrained-embeddings preprocess

Pose2Sign

  1. Clone my fork of pix2pixHD.
git clone https://github.com/monkeyhippies/pix2pixHD.git
  1. Put your preprocessed training data in the pix2pixHD repo under the subdirectory datasets/pose2sign/. Your data should look like this:
pix2pixHD/datasets/pose2sign/
    train_A/
        segment-1-0001.jpg
        segment-1-0002.jpg
            .
            .
            .
    train_B/
        pose-1-0001.jpg
        pose-1-0002.jpg
            .
            .
            .
    test_A/
        segment-101-0001.jpg
        segment-101-0002.jpg
            .
            .
            .
    test_B/
        pose-101-0001.jpg
        pose-101-0002.jpg
            .
            .
            .
  1. Within the pix2pixHD repo, run the following command to train:
python3 train.py --name pose2sign --dataroot /home/ubuntu/pix2pixHD/datasets/pose2sign/ --label_nc 0 --no_instance --resize_or_crop None

You can download a pre-processed train and test dataset from google drive Training on the pre-processed dataset for 11 epochs, which produces reasonable results, will take around 10 days on a single tesla k80 gpu.

Pose Lookup Creation

If you would like to create the pose lookup from scratch:

make create-video-metadata create-video-lookup

You will be prompted to provide AWS keys with s3 permissions to store the lookup. Make sure you've already finished the steps in Configs section of this README before running the above command. Also note that processing everything required ~50hrs on a (Tesla k80) GPU.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors