Tweets sentiment classifier group project

Project description

This project classifies tweets with specified hashtag into positive and negative ones. It's implemented using Scala language and Apache Spark framework and includes several stages:

Streaming - converting RSS stream of tweets from QueryFeed into Spark data abstraction - RDD (ready GitHub project was used for that)
Preprocessing - remove redundant characters and words, convert to lowercase, etc.
Model training - using models from Spark Machine Learning library and training them on Twitter Sentiment Analysis dataset.
Classifying tweets - using saved ML model for classifying tweets from RSS stream.

Project structure

classifier/ — Classification module
- Main - example class of how to operate with Model class (also generated pretrained model)
- MLFindModel - used to identify the model with best accuracy
- Model - class which represents the classifier
- ModelLoader - simple support static methods
preprocessing/ — Preprocessing module
- PreprocessingDataset - static method for preprocessing train dataset before learning
- PreprocessingExample - obvious from the name
- PreprocessTweet - class for processing text messages (described earlier)
streamer/ - streaming module

How to use a project

If you wish to train model - use example provided in classifier.Main
If you wish to start streaming rss & process it - run streamer.RSSDemo with one argument - tag (cat, dog, winter, etc.). Just one word, nothing else.

Outputs

Pretrained model is then stored close to classifier/twits/train.csv resource (in model folder).
Each new batch produced by RSS generates two artifacts - folder with initial data and after model process. (${tag}_input and ${tag}_result)

Configurations

All the manipulations with spark were done on local machine without usage of the cluster.
Paths in a program are configured to use files from local machine.
RDDs are generated each 10 seconds.

References

QueryFeed
RSS streaming GitHub repository
Twitter Sentiment Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
custom_results		custom_results
project		project
src/main		src/main
temp		temp
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tweets sentiment classifier group project

Project description

Project structure

How to use a project

Outputs

Configurations

References

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

evgerher/twitter-classifier

Folders and files

Latest commit

History

Repository files navigation

Tweets sentiment classifier group project

Project description

Project structure

How to use a project

Outputs

Configurations

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages