Diablo

This project is for CS 6601 Artificial Intelligence at Georgia institute of Technology. The project is to Paraphrase tweets, ie find out if two tweets are similar in meaning given the tweets. We implement a sliding window approach, where we learn the word embedding vectors through a neural language model, normalize it, and then run dynamic pooling to get equally sized similarity matrices. We then flatten this and add additional features like Sentence Length, Placeholder word frequency (punctuation, numbers) and Common Named Entity terms to get the final feature vector. We pass this to the Logistic regression classifer and train it to identify similar and non similar sentences from our training set. We achieve a f measure score of 63.8%.

Instructions on how to run:

To run unnormalized:

Change line 2 and 3 in run.sh to input.txt

sh run.sh python simMat.py python classifyTweets.py

To run Normalized:

Check if normalizedInput.txt exists.

If yes:

Change line 2 and 3 in run.sh to normalizedInput.txt instead of input.txt

sh run.sh

python simMat.py

python classifyTweets.py

If no:

python twitterNormalizer.py -- this should generate normalizedInput.txt

Change line 2 and 3 in run.sh to normalizedInput.txt instead of input.txt

sh run.sh

python simMat.py

python classifyTweets.py

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
mat_files_for_input		mat_files_for_input
word2vec		word2vec
README.md		README.md
baseSimMat.py		baseSimMat.py
classifyTweets.py		classifyTweets.py
classifyTweets_normalized.py		classifyTweets_normalized.py
dict_dump.pickle		dict_dump.pickle
dp.py		dp.py
dp.pyc		dp.pyc
emnlp_dict.txt		emnlp_dict.txt
input.txt		input.txt
metrics.py		metrics.py
metrics.pyc		metrics.pyc
normalizedInput.txt		normalizedInput.txt
readme.txt		readme.txt
report.pdf		report.pdf
run.sh		run.sh
simMat.py		simMat.py
train.data		train.data
tweetsToInput.py		tweetsToInput.py
twitterNormalizer.py		twitterNormalizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diablo

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

nidishrajendran/Diablo

Folders and files

Latest commit

History

Repository files navigation

Diablo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages