embedding

This directory contains models for unsupervised training of word embeddings using the model described in:

(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space, ICLR 2013.

Detailed instructions on how to get started and use them are available in the tutorials. Brief instructions are below.

Word2Vec Tutorial

Assuming you have cloned the git repository, navigate into this directory. To download the example text and evaluation data:

curl http://mattmahoney.net/dc/text8.zip > text8.zip
unzip text8.zip
curl https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip > source-archive.zip
unzip -p source-archive.zip  word2vec/trunk/questions-words.txt > questions-words.txt
rm text8.zip source-archive.zip

You will need to compile the ops as follows:

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -shared word2vec_ops.cc word2vec_kernels.cc -o word2vec_ops.so -fPIC -I $TF_INC -O2 -D_GLIBCXX_USE_CXX11_ABI=0

On Mac, add -undefined dynamic_lookup to the g++ command.

(For an explanation of what this is doing, see the tutorial on Adding a New Op to TensorFlow. The flag -D_GLIBCXX_USE_CXX11_ABI=0 is included to support newer versions of gcc. However, if you compiled TensorFlow from source using gcc 5 or later, you may need to exclude the flag.) Then run using:

python word2vec_optimized.py \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

Here is a short overview of what is in this directory.

File	What's in it?
`word2vec.py`	A version of word2vec implemented using TensorFlow ops and minibatching.
`word2vec_test.py`	Integration test for word2vec.
`word2vec_optimized.py`	A version of word2vec implemented using C ops that does no minibatching.
`word2vec_optimized_test.py`	Integration test for word2vec_optimized.
`word2vec_kernels.cc`	Kernels for the custom input and training ops.
`word2vec_ops.cc`	The declarations of the custom ops.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding

embedding

README.md

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
word2vec.py		word2vec.py
word2vec_kernels.cc		word2vec_kernels.cc
word2vec_ops.cc		word2vec_ops.cc
word2vec_optimized.py		word2vec_optimized.py
word2vec_optimized_test.py		word2vec_optimized_test.py
word2vec_test.py		word2vec_test.py

Files

embedding

Directory actions

More options

Directory actions

More options

Latest commit

History

embedding

Folders and files

parent directory

README.md