Bhanu Final Report
Bhanu Final Report
Bachupally,Kukatpally, Hyderabad,Telangana,India,500090
2024-2025
I
GOKARAJU RANGARAJU
INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Autonomous)
CERTIFICATE
This is to certify that the major project phase-I work entitled “YouTube Study Mode: Chrome
Extension for Educational Content Filtering and Comment Analysis” is submitted by
Jadi Bhanu Prakash(21241A0520) in partial fulfillment of the award of a degree in BACHELOR
OF TECHNOLOGY in Computer Science and Engineering during the academic year 2024-
2025.
EXTERNAL EXAMINER
II
Many people helped us directly and indirectly to complete our project successfully. We
would like to take this opportunity to thank one and all. First, we wish to express our deep
gratitude to our guideDr. K Anuradha Professor, Department of CSE for his/her support in
the completion of our project work Phase-I successfully and for all the time to time guidance
provided. The process made us to learn a lot practically and we are very thankful to the final year
Project Coordinator Dr. N. Krishna Chythanya, Asst. Prof., CSE and our class project
coordinator Ms. Vaibhavi Bharadwaj, Asst. Prof., CSE for their unwavering support in
providing schedules, formats, rubrics and arranging the seminars at regular intervals. Our Sincere
thanks to Project Review Committee member Dr. G. Sakthidharan, Professor, CSE, for their
in-depth analysis during evaluations of seminars that helped us in improvising the quality of our
work. We wish to express our honest and sincere thanks to Dr. B. Sankara Babu, HOD,
Department of CSE, to our Principal Dr. J. Praveen and to our Director Dr. Jandhyala N
Murthy for providing all the facilities required to complete our major project Phase-I. We
would like to thank all our faculty and friends for their help and constructive criticism during
the completion of this phase. Finally, we are very much indebted to our parents for their moral
support and encouragement to achieve goals.
III
DECLARATION
We hereby declare that the major project phase-I entitled “YouTube Study Mode: Chrome
Extension for Educational Content Filtering and Comment Analysis” is the work done
during the period from 2024-2025 and is submitted in the partial fulfillment of the requirements
for the award of the degree of Bachelor of Technology in Computer Science and Engineering
from Gokaraju Rangaraju Institute of Engineering and Technology (Autonomous). The
results embodied in this phase-I of project have not been submitted to any other University or
Institution for the award of any degree or diploma.
IV
Table of Contents
4 4.1 Modules 20
4.2 Methodology Proposed for Solution Implementation. 21
4.3 Summary 22
Conclusion and Future Scope 23
5 5.1 Conclusion 23
5.2 Future Scope 23
6 References 24
V
Appendix
i) Research Article-Submission/Publication Proof
VI
LIST OF FIGURES
Fig No Fig Title Page No
3.1 System Architecture Diagram 16
3.2 Use Case Diagram 17
3.3 Class Diagram 18
3.4 Sequence Diagram 19
4.1 Methodology for Solution 21
implementation
VII
Abstract
This project aims to design a Google Chrome extension that filters educational content while
searching on YouTube. To do this, the extension identifies whether or not a video is educational
by executing two steps: In the first step, a machine learning model processes the dataset of
YouTube video titles to determine if the content is educational or non-educational. In the
second step, the extension analyses the sentiment of the video's
Natural Language Processing (NLP) techniques have been used in creating comments, and
these will determine overall usefulness to the video. Upon opening a YouTube video, the
extension checks the video title against the trained model, and upon finding that the content is
not educational, it alerts the user promptly using a popup while challenging him to stay focused
on educational material.
1
1.Introduction
1.1 Overview
This project is focused on the creation of a Google Chrome extension designed to
help students stay focused on educational content while browsing YouTube. The biggest area
of distraction for these users by an application like YouTube generally pertains to non-
educational videos. Using machine learning models, the extension categorizes YouTube videos
categorized based on their titles as either educational or non-educational. Moreover, it utilizes
NLP approaches to comment text related to any video, which determines its sentiment and
reflects ideas about the usefulness that members in the community develop towards the video.
The extension can provide real-time feedback to the student by notifying of watching videos
not related to study material and motivate them to view content with relevance to the learner's
goal. This approach reduces distractions, will cultivate productive usage of YouTube, and
improve overall study efficiency. This practical solution addresses a common problem to
students.
1.2 Motivation
In today's digital world, YouTube happens to be a much wider means for getting both
educational and entertaining content. However, the students are easily distracted with the non-
educational ones, and this drastically diminishes their productivity and focus toward their set
goals. The willingness to browse something not related or even just entertaining sets them away
from comprehending the study session. The need to address this widespread issue finds its
motivation in creating a tool that will enable better utilization of a student's time on YouTube.
An extension that uses the latest technologies such as machine learning and NLP for
empowering the students to focus on valuable, educational content is created with this purpose
in mind. In addition, the tool sets in achieving wider goals of promoting digital wellbeing
because it facilitates more mindful and productive online activity.
2
1.3 Objectives
The primary objectives of the project are:
1. Developing Chrome Extension that Automatically Classifies YouTube Videos as
Educational or Non-Educational Using Machine Learning Models.
2. Evaluations can be improved by analyzing video comments using NLP techniques, thereby
gaining an understanding of how the community perceives a video's usefulness.
3. Real-time alerts and feedback to users, enabling priority on content that falls more in line
with their stated learning goals.
4. Decrease distraction and create a more focused online learning environment.
1.4 Challenges
3
2.Literature Survey
2.1 Introduction
The Lecture Survey is a system designed to collect, analyze, and report feedback from students
regarding their lectures and academic sessions. It provides an effective platform for students to
express their opinions, enabling educators and institutions to gain valuable insights into the
effectiveness of their teaching methods and course delivery. The survey focuses on key aspects
such as teaching quality, content relevance, engagement level, and overall satisfaction,
ensuring a comprehensive evaluation of the educational experience
In [1], The study ‘Twitter Sentiment Analysis During Covid-19 Outbreak with
VADER’ analyzed the sentiment of the crowds on Twitter regarding COVID-19 outbreaks
from 1 January 2020 up to 1 July 2020. It examined trends in public opinion over the course of
that period of time. The study applied VADER: the tool meant to analyze sentiments around
various subjects on social media, for instance, to categorize tweets into five categories of
sentiment, highly positive, positive, neutral, negative, and highly negative. Overall, it managed
to collect more than 60 million tweets based on the application of COVID-19-related hashtags.
Word cloud and N-grams were also in use in generating the visualizations of frequent terms or
patterns within the data. It was seen to be a trend wherein it shifted to more positive emotions
when the pandemic started to end with more positive emotions because of increased awareness
from the public and adjustment to the situation. This study gives ideas to policymakers on how
control could be done on social perception concerning health cases and create an action plan
during emergencies.
In [2], This is information about Naive Bayes-a machine learning algorithm for the text-
classification job. It especially helped in categorizing the unstructured data, like emails and
social media posts. This classification of text by Naive Bayes depends on its features. It
independently assumes the occurrence of these features with a given probability. In this regard,
the technique is independent of the size of datasets and is computationally efficient as well. The
applications include spam filtering, sentiment analysis, and so forth. Even though there are
limitations such as zero frequencies, Naive Bayes has proven effective in most real-world tasks.
Conclusion In this paper, the technique of text classification ends up being efficient on large
datasets and also the text itself.
In [3], This paper discusses the integration of a text classification system based on the
Naïve Bayes algorithm: one of the most popular machine learning techniques, which in fact
originated from Bayes' Theorem. The Natural Language Processing with Disaster Tweets
4
dataset is evaluated and taken under various pre-processing techniques to be applied to these
texts, including cleaning, removal of stop words, and lemmatization, for training Naive Bayes
classifier to classify these tweets into two groups-real or fake disasters. Laplace smoothing was
applied to the problem of zero frequency. The results have indicated that Naïve Bayes is
efficient, scalable, and useful for the type of classification task it has to perform on texts so a
good candidate for real-world applications.
In [4] The research paper which reads "Sentiment Analysis Using Natural Language
Processing (NLP) and Machine Learning" reports on the role that NLP and ML play in
understanding sentiments in digital communication. Much consideration is given to the
necessity of how customer feedback need be analyzed in order to decide on strategies in
business. Techniques include data processing through the NLTK and use of WordNet
Lemmatizer. All the codes are implemented with models like LSTM Recurrent Neural
Networks. This models are, therefore, implemented using Python in sentiment analysis. It then
concludes by saying that these technologies enable businesses to effectively screen and review
copious volumes of data which in turn has an impact on decision making capabilities and
customer experience.
In [5], The paper entitled "YouTube Comment Analyzer Using Sentiment Analysis"
proposed the construction of an analyzer tool that would execute an opinion judgment
procedure based upon user sentiment through comments in YouTube. The authors, relying on
NLP and ML, successfully implemented the VADER algorithm popular for its lexicon- and
rule-based approach to classify comments as positive, negative, or neutral. It initiates scraping
data through using the YouTube API and does pre-processings on the text texts, cleaning special
characters, tokenization, and lemmatization. VADER's mechanism of sentiment scoring works
way much better than Naïve Bayes and Support Vector Machines algorithms. Therefore, the
study concluded that the performance of VADER is well suited for analysis of informal and
emotion-rich language content of YouTube comments; it is indeed a good tool to be used in
understanding user engagement.
In [6], YouTube Comments Extraction and Sentiment Analysis Using NLP is a system
that applies NLP for the analysis of comments on YouTube. The system first gets the comments
from YouTube using the YouTube Data API. Then, several pre-processing operations are done
with text, such as cleaning and tokenization. Finally, comments are classified either as positive
or negative using VADER, the tool applied for sentiment analysis. Pie charts and bar graphs
are provided for the visualization of results, displaying the way the sentiment is distributed,
whereas a real-time feedback system is provided in the form of a web application with email
notifications. This actually develops the content-creation process by making the creator aware
of what the audience thinks about a particular thing and adjusts the content according to their
thoughts.
In [7], The five most popular machine learning classifiers used are Naïve Bayes,
Random Forest, Decision Tree, Support Vector Machines (SVM), and Logistic Regression.
Apache Spark is the implementation in terms of the aim of product review text classification.
5
These techniques include n-gram analysis that incorporates unigrams, bigrams, trigrams and
text preprocessing techniques including tokenization, stemming, and stop word removal. The
data trained and tested was done using 10-fold cross-validation and classifier accuracy was
measured.Among all, maximum accuracy to the tune of 58.50 % was achieved in logistic
regression. On the other hand, the minimum accuracy to the tune of 24.10 % was noticed in the
decision tree. In fact, marginally increasing the size of training data shows improvement in the
accuracy of most classifiers to a good extent. Hence, the study concludes that for multi-class
classification, logistic regression should be employed to attain maximum accuracy for the
classification problem while the decision tree is the least effective.
In [8], This paper discusses five classifiers to compare, which include Naïve Bayes,
Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression using
Apache Spark for the purpose of text classification of product review. Techniques like n-gram
analysis of the language (unigrams, bigrams, trigrams) and text preprocessing techniques have
applied for the process of characterizing the language while tokenizing, stemming, and
removing stop words were performed. The performance of the classifiers was tested on Amazon
product review data using the assistance of 10-fold cross-validation. As for the results, the
highest classification accuracy reached 58.50%, and this was managed to be achieved by
Logistic Regression, while the lowest was shown by Decision Tree. Finally, Logistic
Regression was well suited for multi-class classification as compared to the performance of
Decision Tree on multi-class classification, hence quite poor in the experiment. With increasing
training data size, accuracy improved by just a slight margin.
In [9], The paper does the sentiment analysis of YouTube comments using machine
learning techniques. The methods used are a lexicon-based, a machine learning-based, and a
deep learning approach for the purpose of sentiment analysis. In this work, the overall focus is
comment classification based on the comment being either positive, negative, or neutral, and
tools used are Recurrent Neural Networks and VADER for deriving emotions. The
preprocessing steps involve web scraping, tokenization, and various encoding strategies; this
includes one-hot and p-hot encoding. In a nutshell, a machine learning approach, specifically
through RNNs, improves the accuracy in sentiment classification; hence, they are pretty
important to be realized to understand user reviews on YouTube.
In [10], The reviews obtained from the Google Play Store were analyzed for sentiment
concerning TikTok. This was done using VADER, the sentiment analysis method, as well as a
Support Vector Machine model. Since the reviews in most cases were positive, the data was
imbalanced. This called for the implementation of the algorithms. Thereby, both Random
Under-sampling RUS and Random Over-sampling ROS methods were applied. After
preprocessing data, the SVM model was trained on it. Using no resampling, the F1-score
produced an average of 0.80. It thus shows that the use of resampling had failed to improve on
the results. Conclusion Hence, there is a possibility that sentiment analysis can assist TikTok
in optimizing user experience better by showing them where they are getting it right and exactly
where there is room for improvement.
6
of Words model and the Association List approach. The technique relies on extracting
comments, cleaning through pre-processing tokenization and removal of stopwords and
applying feature extraction for classification. Study classifies comments as relevant and
irrelevant or alternatively as positive or negative. Results. For both approaches, it seems that
the Bag of Words model has better performance about positive comments, and that Association
List is the one to be chosen in order to identify the irrelevant ones. Conclusion. Combining the
above approaches within a single system enhances the precision in comment classification. I
thus recommend multilingual comments as future work.
In [14], This paper deals with the task of sentiment analysis of comments on YouTube
using a hybrid Naïve Bayes-Support Vector Machine classifier. Here, researchers are applying
text mining techniques, such as case folding, tokenization, filtering, and stemming, for
comment preprocessing. For small datasets, Naïve Bayes has been applied; SVM is good at
large datasets. Furthermore, the combination of both applications will enhance the performance
of the classifier. Data collection has been acquired from the API of YouTube, out of which 70%
was used for training the classifier and 30% of data prepared for testing purposes. The results
show that the hybrid NBSVM classifier was able to achieve a precision of 91%, recall of 83%,
and F1 score of 87%, which shows just how valid the hybrid model is for sentiment analysis
(naivebayes2019).
7
visualized in pie charts and word clouds, and the data is kept in the CSV for further processing.
Another hybrid model improves the accuracy and reliability in identifying sentiments. Valuable
insights are revealed into the preferences and trends of the audience(naivebayes2019)(youtube
comment analysis).
In [16], A survey on sentiment analysis across YouTube comments-extends the scope
of classification of user comments as either positive, negative, or neutral-classified by
application of machine learning techniques. This dataset includes 1,500 citation sentences that
have been cleansed using normalization techniques such as lemmatization and stop-word
removal. Six types of classifiers, namely, Naïve Bayes, SVM, Logistic Regression, Decision
Tree, K-Nearest Neighbor, and Random Forest-were tested for accuracy and evaluated for F-
score. The results in general unmasked SVM as the most efficient model, especially with
unigram feature, followed by Naïve Bayes. Unigrams and SVM generally produced the highest
performance, thus underlining the strength of structured text preprocessing and specific model
selection for sentiment analysis.
In [17], A study was set on the Naive Bayes algorithm, with an aim towards its
application in text classification using news articles from 20 categories, total 18,846 articles.
The methodology adopted here involves the preprocessing of the dataset with TF-IDF and the
classifier Multinomial Naive Bayes. Evaluation was done using the confusion matrix and the
accuracy score at 77.39%. Heatmaps were used for illustrations of classification results. Naive
Bayes proved to be effective despite being simple, and after parameter adjustment, it can also
be improved or combined with other algorithms.
In[18], This research work has utilized VADER in the exploration of sentiment
analysis on Albanian language social media posts of companies' services. The researchers
developed a custom lexicon of approximately 9,500 Albanian adjectives and verbs with which
to classify sentiments as being positive, neutral, or negative. In addition to noise reduction in
the datasets, accuracy benefits arose from the data preprocessing steps, including removing
non-Albanian words and duplicates. Sentiment classification of the posts using the VADER
model for both companies yielded accuracies of 95.6% and 89.4% within each dataset. The
results well demonstrate the effectiveness of VADER with custom lexicons on low-resource
languages and robust handling of informal social media text.
In[19], This paper will classify sentiments with more than two categories using
VADER for sentiment analysis on Twitter. Data for this study were gathered from 2,430
political tweets posted within the context of the 2016 US presidential election. The research
made use of the NodeXL to get the data, while NLTK was used as a preprocessing tool in
cleaning up tweets. Then VADER is applied in categorizing sentiments into five levels: highly
positive, positive, neutral, negative, and highly negative. VADER was the appropriate tool for
the analysis performed on social media text; however, the size of the dataset and the general
lexicon, would limit the accuracy. More possibilities for future improvements could be through
more extensive datasets and more precise lexicons.
In[20], This paper does a comparative analysis of the machine learning algorithms used
for text classification. It included datasets IMDB and SPAM, comprising movie reviews and
SMS, with the algorithms Support Vector Machine (SVM), k-Nearest Neighbor (k-NN),
8
Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF). The
whole process in this paper is encompassed by data preprocessing followed by training and
testing the classifiers. Results indicate that k-NN gave the highest accuracy on the SPAM
dataset at 98.5% while LR achieved the highest accuracy on the IMDB dataset at 85.8%.
Generally, SVM and LR were shown to be accurate and possess good recall. Conclusion From
the experimentations carried out in the present work, it is concluded that algorithms behave
differently with some datasets. Hyperparameter tuning and ensemble methods could be
additional approaches that can be considered in future for improving accuracy in classification
9
2.4 Problem Statement
Students do not learn well since YouTube contains overwhelming non-educational distracting
videos. The recommendation algorithm presents students with various videos to watch that may
not add much knowledge but are entertaining and that causes the student to not focus on
learning well. Moreover, most videos have attractive titles and comments whose judgment
cannot be ascertained from which will tend to decrease the credibility of educational content.
This can lead to an inability to make informed decisions and stay focused on studies because
of the fuzziness between useful and distracting videos.
2.6 Summary
The literature survey evaluated existing research on feedback mechanisms in education,
focusing upon tools and methods for the collection and analysis of responses given by students.
It highlights developments that include digital surveys and analytics but notes low participation
rates, delayed feedback, and the lack of real-time insights. The survey emphasizes the
importance of actionable feedback, anonymity for honest answers, and the timeliness in
implementing results. This review serves as a basis for the development of effective, data-
driven systems that upgrade teaching quality and student engagement.
10
System Requirements Specification
III. Software Requirement Specifications
1.Introduction:
3.1.1 Purpose of the Requirements Document
The purpose of this document is to provide a detailed description of the functional and non
functional requirements for the "YouTube Study Mode" Chrome extension. It serves as a
reference for developers, stakeholders, and testers, outlining the project's goals, features, and
behavior to ensure the product meets its intended purpose of aiding students in focusing on
educational content by filtering non-educational videos and analyzing the usefulness of
comments on educational videos.
3.1.2 Scope of the Product
The "YouTube Study Mode" extension is designed to improve students’ focus on educational
content on YouTube by filtering non-educational videos, analyzing comments for educational
videos, and forcefully pausing distracting content. This extension will classify videos based
on
their titles and evaluate comments’ sentiment for relevance, providing insights into a video’s
usefulness. Additionally, it will provide user control over classification sensitivity and video
playback interruptions.
11
3.1.4 References
The development of the Telugu HTR system is guided by credible sources and industry
standards, including:
1. Chrome Extension Development Documentation - Chrome Developer Site
2. YouTube API Documentation - YouTube API Docs
3. Text Classification and Sentiment Analysis Literature - Relevant textbooks and articles on
NLP and ML techniques..
3.1.5 Overview of Document Structure
The document will cover the following:
General description of the product
Specific functional and non-functional requirements
Interfaces and interactions for end-users and developers
Constraints and assumptions involved in development
Future scope and potential enhancements
12
3.2.3 User Characteristics
General Users
Description:
General users primarily include students and individuals who aim to focus on
educational content while using YouTube for study purposes. These users rely on the
extension to filter non-educational content and guide them toward productive learning
resources.
Key Needs:
o Minimize distractions from non-educational videos on YouTube.
o Identify and assess the relevance of educational videos based on sentiment
analysis of comments.
o Easily customize settings to adjust classification sensitivity and playback
restrictions.
Technical Skills:
Basic computer literacy and familiarity with browser extensions are required. Users
are not expected to have any programming or machine learning knowledge.
Researchers and Developers
Description: Researchers and developers working on this project may include individuals
with a background in software development, machine learning, or natural language
processing. They work on improving the accuracy of the classification and sentiment analysis
models, enhancing the extension’s functionality, and refining user interface elements.
Key Needs:
Access to training data and algorithms used for classification and sentiment analysis.
Ability to test and refine the ML model for better accuracy in classifying educational
vs. non-educational content.
Tools and resources for integrating model updates and adjusting extension settings
based on user feedback.
Technical Skills:
Intermediate to advanced programming skills (JavaScript, Python) and experience
with machine learning, particularly in text classification and sentiment analysis.
Familiarity with APIs, particularly the YouTube Data API, and Chrome extension
development is also recommended.
3.2.4 General Constraints
1. Internet Dependency: Requires internet access to retrieve data from YouTube’s servers
and perform online classification or sentiment analysis if cloud-based.
13
2. API Limitations: Subject to YouTube API limitations on data requests.
3. Chrome Compatibility: This extension is currently compatible only with Google
Chrome and Chrome-based browsers.
4. Processing Overhead: The classification and analysis functions may add processing
time, especially on lower-end devices.
3.2.5 Assumptions and Dependencies
It is assumed that users will have an active YouTube account and access to the Chrome
browser.
The system relies on accurate API responses from YouTube for retrieving video
metadata and comments.
The product depends on local or cloud-hosted machine learning models, which may
require periodic updates to remain effective in classification and sentiment analysis.
The classification and sentiment analysis algorithms assume that text-based titles and
comments provide enough information for accurate categorization.
14
Video classification should occur within a second, and sentiment analysis within 2–3
seconds, with minimal impact on YouTube’s loading time
3.3.2.4 Scalability
The model must handle frequent API requests efficiently and support expansion to other
platforms or complex analyses like sentiment analysis of replies.
3.3.2.5 Model Accuracy
Video title classification should achieve at least 85% accuracy, while sentiment analysis
should reach 80% accuracy for reliable educational content evaluation.
3.3.3 User Interface Requirements
Popups shall appear near the mouse pointer when hovering over video thumbnails,
providing video classification and sentiment information.
• The interface shall include options to override classifications, view sentiment analysis
results, and toggle features such as forced video pauses for non-educational content.
3. User preferences:
They include video and/or channel user overrides.
Squelching or acceleration sensitivity settings and stoppage behavior for classification action.
3.3.5 Hardware Requirements
These are the hardware requirements for developing the project using a laptop.
CPU: Intel i5 (10th generation) or higher; AMD Ryzen 7 (5000 series) or higher.
Hard disk: 256 GB SSD storage or higher.
15
3.4 Architecture and UML Diagrams
3.4.1 Architecture Diagram
16
3.4.2 Use Case Diagram
17
3.4.3 Class Diagram
18
3.4.4 Sequence Diagram
3.5 Summary
This chapter outlined the detailed system requirements, including functional and non-functional
aspects, user interface needs and hardware requirements. These specifications provide the
foundation for the subsequent stages of system design and implementation.
19
4.Methodology
4.1 Modules
The project shall be segmented into the following key modules:
1.User Interface Module : It will be an interactive front-end where users are given filtered
educational content.
2.Model Training Module: The training is involved in machine learning models that can
incrementally enhance the accuracy of classifications based on labeled datasets and user
feedback.
3. YouTube API Module: Retrieve video data from the YouTube API, titles, and descriptions,
with added comments for real-time analysis.
4. Title Classification and Comment Analysis Module: Uses Natural Language Processing to
classify videos for educational value based on the titles and comments of users.
5. Popup Display Module: Manages notifications sent to the user for the detection of non-
educational content as well as the results of the analysis of view sentiment.
20
4.2 Methodology Proposed for Solution Implementation
To implement the proposed solution, a methodological framework is designed as follows,
Data Collection
To build a robust classification and sentiment analysis system, video title datasets
containing both educational and non-educational titles are collected. These datasets are
used to train the classification model. Similarly, datasets with user comments are
gathered to train the sentiment analysis model, enabling the system to understand and
predict the sentiment behind user feedback (Positive, Neutral, or Negative).
Model Training
21
Integration with YouTube
The YouTube Data API and DOM manipulation techniques are utilized to retrieve
video metadata, including titles and comments. These elements are processed in real
time using the trained models, enabling the extension to classify videos dynamically
and provide insights during user interactions with the platform.
The system undergoes extensive testing to ensure its reliability. Title classification
accuracy is evaluated by testing the system with diverse YouTube videos, while
sentiment analysis results are validated using real-world user comments. Usability
testing is conducted to refine the extension's interface and ensure a seamless user
experience.
Deployment
The trained machine learning models are hosted on either a local server or a cloud-
based platform to facilitate smooth API integration. Once ready, the Chrome extension
is deployed, making it easily accessible to end users, thus enabling them to improve
their learning efficiency on YouTube.
4.3 Summary
• The method of the YouTube Study Mode project focuses on leveraging machine learning
models and seamless integration with YouTube to filter non-educational content and analyze
the usefulness of educational videos. The solution is spread across key modules of
classification, sentiment analysis, video control, and user interface for ensuring modularity and
scalability. Data collection, model training, extension development, and deployment are all
aligned within the proposed methodology. This systematized way is meant to take away
distractions from student learning, allowing them to focus on meaningful and productive
learning experiences.
22
5.Conclusion and Future Scope
5.1 Conclusion
The project attempts to address the common problem facing students who tend to be
inattentive to educational content in the midst of non-educational videos when using
YouTube. By applying a combination of machine learning techniques to classify titles and
perform sentiment analysis on video comments, the extension assists in quickly deciding
whether a video is educationally relevant or not for students. It helps focus by pausing the
non-learning material videos forcefully, allowing students to properly use their time and
modify their learning environment. This project promises to be an effective solution for
increasing productivity and improving learning efficiency in this world so saturated with
digital distractions. This extension can become an even more effective tool for students by
staying on track with their educational goals through continual improvements and updates to
the classification models.
23
6.References:
1."Machine Learning for Content Classification: A Survey"Journal of Machine Learning
Research (2016) - This paper provides a comprehensive overview of machine learning
techniques for content classification, essential for training the model to classify 2."Natural
Language Processing (NLP) Techniques for Sentiment Analysis"IEEE Transactions on
Affective Computing (2018) - This article discusses various NLP techniques for sentiment
analysis, crucial for analyzing video comments to determine their usefulness.
3. "Building Chrome Extensions: A Guide"Google Developers Documentation (2021) - The
official documentation provides essential information on developing Chrome extensions using
HTML, CSS, JavaScript, and Chrome APIs, critical for creating and deploying the extension.
4.Ahmed, M. E., Rabin, M. R. I., & Chowdhury, F. N. (2020). COVID-19: Social Media
Sentiment Analysis on Reopening. arXiv preprint arXiv:2006.00804.
5.China in Times of Coronavirus. Available at SSRN: https://ssrn.com/abstract=3608566 or
http://dx.doi.org/10.2139/ssrn.3608566 C. Kaur and A. Sharma, (2020) “EasyChair Preprint
Twitter Sentiment Analysis on Coronavirus using Textblob”.
24