Skip to content

angimocc/JWAE-PyTorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JWAE-PyTorch

This repository contains the PyTorch implementation of the paper:

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings (ICCV Workshops 2019)

This work focuses on aligning multimodal embeddings using advanced Wasserstein autoencoder techniques.

Requirements

The code is written in Python 2.7.0 and CUDA 9.0.

Requirements:

  • torch 0.3
  • torchvision 0.3.0
  • nltk 3.5
  • gensim
  • Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt

To install the requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed Data

  1. The preprocessed COCO and Flickr30K datasets used in the experiments are based on the SCAN and can be downloaded from:

    Place the downloaded datasets in the data folder.

  2. Run vocab.py to generate the vocabulary for the datasets:

python vocab.py --data_path data --data_name f30k_precomp
python vocab.py --data_path data --data_name coco_precomp

Training

Train a new JWAE model using the following:

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --vocab_path "$VOCAB_PATH"

Evaluation

Evaluate the trained model using:

from vocab import Vocabulary
import evaluation
evaluation.evalrank("$CHECKPOINT_PATH", data_path="$DATA_PATH", split="test")

Bibtex

@inproceedings{Mahajan:2019:JWA,
  author = {Shweta Mahajan and Teresa Botschen and Iryna Gurevych and Stefan Roth},
  booktitle = {ICCV Workshop on Cross-Modal Learning in Real World},
  title = {Joint {W}asserstein Autoencoders for Aligning Multi-modal Embeddings},
  year = {2019}
}

About

PyTorch implementation of Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings from ICCV 2019 Workshop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages