#Masks Throughout COVID-19: A Twitter Sentiment Analysis

Approach

Using twint to scrape Twitter, we perform natural language processing (NLP) techniques to analyze the sentiment of tweets relating to masks and coronavirus and classify them as Negative, Neutral or Positive. Through text processing, exploratory data analysis and feature engineering, we discover insights into how important words, topics, and subjectivity relate to sentiment. We then create predictive models to provide further insight and confirm our findings during EDA.

Some questions:

How does the sentiment of tweets change over time?
- Hypothesis: Tweets will be more negative on average in January and get more positive on average as time goes on.
Will Twitter stats (number of likes, replies, retweets) play a role in determining sentiment?
- Hypothesis: The most important features will most likely be the words themselves.
Does topic modeling provide any insight toward tweet sentiment or the COVID-19 crisis?
- Hypothesis: Topic modeling should be a factor in determining sentiment and can give us insights into the pandemic.
What insights can be provided by using machine learning?
- Hypothesis: The lion's share of the insights will come during EDA.
What are the most frequent words? And do they play a role in determining sentiment?

Findings

Tweets were generally more negative in January but relatively constant from February through May (there were also far fewer relevant tweets in January).
After removing common English stopwords as well as topical stopwords like mask, and virus, the top ten most frequently occuring words were: hand, need, spread, protect, make, help, say, glove, public, and hospital.
A 10-topic LDA model grouped words into the following topics with the following predominant sentiment:
1. Healthcare workers, hospitals: split positive/negative
2. Social distancing: positive
3. Protesting, lockdowns: positive
4. Government, health organizations: negative
5. Spreading the virus: positive
6. Emojis, swear words: positive
7. COVID19 statistics: split positive/negative
8. Preventing infection: positive
9. General opinions: negative
10. Riots, BLM: neutral
Topic modeling provided some interesting insights but was not helpful in prediction modeling.
Some of the features that prediction models weighed the heaviest were surprising:
- Subjectivity Score
- Number of likes
- Number of retweets

Most prevalent features in the model (in order)

10 most common words (after removing stopwords):

'need'
'spread'
'protect'
'make'
'help'
'say'
'glove'
'public'
'hospital'
'new'

10 best features (Decision Tree Classifier):

Subjectivity score  (0.0611)
Number of likes     (0.0139)
'protect'           (0.0132)
'help'              (0.0129)
'infected'          (0.0115)
'safe'              (0.0094)
'please'            (0.0083)
'death'             (0.0083)
'hand'              (0.0076)
Number of replies   (0.0072)

Final conclusion

The overall sentiment of tweets was fairly evenly divided between positive and negative throughout the five months. There were some interesting results from our prediction models, namely that some continuous variables like subjectivity score, number of likes, and number of replies were some of the most important variables for predicting a tweet's sentiment. Other important features were words with high frequencies. Given more time we would try to get better accuracy via a deep learning model, including an LSTM model. And finally, we would like to further investigate sentiment toward the work mask (or masks) in particular as opposed to the overall sentiment of the tweet as a whole.

List of files

Images folder - charts and visualizations created during the project
.gitignore - list of files and pathways to ignore
README.md - this very file!
data_cleaning_notebook.ipynb - notebook of compiling our dataframes
eda_visualizations_notebook.ipynb - notebook with EDA and chart/visualization creations
functions.py - file with functions used in this project
modeling_notebook.ipynb - notebook with Naive Bayes and Decision Tree models
nlp_features_notebook.ipynb - notebook with text processing, LDA topic modeling, and subjectivity scoring
twitter_scraping_notebook.ipynb - notebook detailing our scraping of tweets
presentation.pdf - slides for our presentation of this project

Visualizations

Decision Tree Confusion Matrix:
Tweet Sentiment by LDA Topic:
LDA Topics:
Sentiment Distribution of Top 20 Tweets per Day:
Sentiment Distribution Over Time:
Top 25 Words by Frequency:
Topic Distribution Over Time:
Number of Negative Tweets by Topic:
Number of Neutral Tweets by Topic:
Number of Positive Tweets by Topic:
Word Cloud of Top 100 Words:
Word Cloud:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

#Masks Throughout COVID-19: A Twitter Sentiment Analysis

Approach

Some questions:

Findings

Most prevalent features in the model (in order)

10 most common words (after removing stopwords):

10 best features (Decision Tree Classifier):

Final conclusion

List of files

Visualizations

BLOG POST FORTHCOMING

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Images		Images
.gitignore		.gitignore
README.md		README.md
data_cleaning_notebook.ipynb		data_cleaning_notebook.ipynb
eda_visualizations_notebook.ipynb		eda_visualizations_notebook.ipynb
functions.py		functions.py
modeling_notebook.ipynb		modeling_notebook.ipynb
nlp_features_notebook.ipynb		nlp_features_notebook.ipynb
presentation.pdf		presentation.pdf
twitter_scraping_notebook.ipynb		twitter_scraping_notebook.ipynb

EricB10/twitter-sentiment-analysis-project

Folders and files

Latest commit

History

Repository files navigation

#Masks Throughout COVID-19: A Twitter Sentiment Analysis

Approach

Some questions:

Findings

Most prevalent features in the model (in order)

10 most common words (after removing stopwords):

10 best features (Decision Tree Classifier):

Final conclusion

List of files

Visualizations

BLOG POST FORTHCOMING

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages