SENTIMENT ANALYSIS OF IMBD Movie REVIEWS USING LTSM (Long Short-Term Memory) Neural Networks
- MEMBER 1
- Name: Kabiir Krishna
- Email: k7krishn@uwaterloo.ca
- WatIAM: k7krishn
- Student Number: 21106092
In the digital age, online reviews significantly
influence decisions about movies and TV
shows. This project explores the use of Long
Short-Term Memory (LSTM) neural networks
for sentiment analysis of IMDB reviews. Using
distilBERT and VADER, we generate continuous
sentiment scores ranging from -1 to +1 for
our training dataset. These scores train the
LSTM model to handle the sequential nature of
textual data, accurately identifying consensus
and overall sentiment across reviews.
This approach helps the entertainment industry & users understand audience preferences, guiding marketing strategies, recommendation systems, and content creation. Consumers benefit from the wisdom of the crowd, which helps them make better choices.
The technology can also be extended to other areas such as product reviews and social media monitoring. Experiments show that the model effectively captures and analyzes sentiment from large-scale data, demonstrating the potential of sentiment analysis to improve decision-making and tailor content to audience expectations.
Project Root/
│ .gitignore
│ 00-scrape.py
│ 01.1-clean_score.py
│ 01.2-Training_DistilBERT.ipynb
│ 02-msci641_project.ipynb
│ 03-sentiment_finetuning_w_distilbert.ipynb
│ 04-dashboard.py
│ load_model_score.py
│ requirements.txt
│ README.md
│
├───media
│ # Contains media assets for README.md
│
├───models
│ ├───00-baseline
│ │ lstm.pt
│ │ vocab.pth
│ │
│ └───02-final
│ lstm_final.pt
│ vocab.pth
│
└───reviews
├───00-scraped
│ reviews_tt0111161.csv
│ reviews_tt0455944.csv
│ reviews_tt0468569.csv
│ reviews_tt15398776.csv
│
└───01-cleaned_scored
├───VADER
│ cleaned_scored_reviews_tt0111161.csv
│ cleaned_scored_reviews_tt0455944.csv
│ cleaned_scored_reviews_tt0468569.csv
│ cleaned_scored_reviews_tt15398776.csv
│
└───VADER_DISTILBERT_FINAL
vader_dbert_scored.csv
00-scrape.py: Contains logic for Scraping reviews (This script was used to generate initial scraped data which was later cleaned and socred with 01-clean_score.py :). To run it separately, type:python3 00-scrape.py01.1-clean_score.py: Cleans scraped reviews & scores them with VADER to create. (Works on the data scraped by previous script, cleans it, scores it using VADER and outputs cleaned & scored.csvfiles in01-cleaned_scored/VADER/). Can be run separately using:python3 01.1-scrape.py
01.2-Training_DistilBERT.ipynb: Notebooko for Training DistilBERT model, a strong classifier.02-msci641_project.ipynb: Training Script for the LSTM model. Exports the model for future usage too.load_model_score.py: Contains LSTM definintion (Helps with loading) & Function to score using a pre-loaded model.04-dashboard.py: Contains main Logic for dashboard webpage displayed.
03-sentiment_finetuning_w_distilbert.ipynb: Contains logic for loading DistilBERT model trained with01.2-Training_DistilBERT.ipynb& using it to augment the VADER-scored reviews.
reviews/00-scraped/: Contains the initial reviews scraped by Web Crawler.reviews/01-cleaned_scored/VADER/: contains the cleaned reviews which were only scored by VADER.reviews/01-cleaned_scored/VADER_DISTILBERT_FINAL/: Contains the Reivews which were scored by DistilBERT and the final finetuned (VADER + DistilBERT) scores for the review. This was the final training data used to train the model.models/: Contans the Initial & the Final LSTM models used in this project. (The DistilBERT model couldn't be included in this repo since it was too large to be pushed here)
NOTE: Though not required to run the project, the DistilBERT model can be regenerated at user's end by executing the
01.2-Training_DistilBERT.ipynb.
-
Clone the repository
git clone https://github.com/Kabiirk/MSCI641_project.git -
Navigate to the project root
cd MSCI641_project -
Install dependdencies
pip install -r requirements.txt -
Run the Dashboard
streamlit run 04-dashboard.pyThis would automatically open a new Browser window/tab with the dashboard deployed on
localhost.
Upon running the project initially, the Dashboard loads up the pre-scraped reviews of the movie "The Equalizer" so that the sample visualizations are already visible. The users can start real-time scraping and analysis of any new movie by following these steps:
- Type out the movie/TV-Series ID as per IMDb in the Text box (under "IMDb ID") at the top and press the
Scrape Databutton. - A live progess Bar will apprear indicating the status of operations (
Scraping,Scoringetc.) - Once the Reviews are loaded & analysis is done, the Dashboard will update the existing visualization as per the new reviews which have been scored by the model.
Note: The Demo video has been trimmed (& sped up) for brevity. This project scrapes the latest reviews every time the user inputs a Movie/TV-Show ID for analysis by scraping the reviews afresh. Upon creating the dataframe of the reviews, the script uses Pandas'
apply()function (ref.) on theReviewscolumn of the new Dataframe which can take some time depending on the compute resources available on the local machine running the dashboard. Ideally, for quick results, it is suggested to scrape for movies with fewer reviews.