0% found this document useful (0 votes)

18 views20 pages

Cyberbullying Detection

This document discusses the development of a hybrid deep learning model (LSTM-BiLSTM-GRU with attention mechanism) for detecting cyberbullying, which has become increasingly prevalent and harmful in today's digital landscape. The model outperforms traditional machine learning methods, achieving an accuracy of 93.69% by effectively managing large datasets and automatically extracting features. The study highlights the importance of utilizing deep learning techniques for better detection and prevention of cyberbullying across various social media platforms.

Uploaded by

jannuchaitanya.20phd7099

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views20 pages

Cyberbullying Detection

Uploaded by

jannuchaitanya.20phd7099

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

AI- Driven based Protection: Cyberbullying Detection using Hybrid Deep

Sequential Models
Lakshmi Amrutha Valli P1, G Neha Pranavi2, Prathap Adimoolam3, Chinta Venkata Murali Krishna4,
Chaitanya Jannu5, Veeraswamy Parisae6, Yalamanchili Arpitha7
1,5,6,7
Department of ECE, 2,4Department of CSE-DS, 3Department of CSE-AIML, NRI Institute of
Technology, Agiripalli, India.
1
lakshmiamruthavallipamidi@gmail.com,2neha.gajjala@gmail.com, 3
adimoolam.prathap@gmail.com,
4 5 6
muralikrishna_chinta2007@yahoo.co.in, pvspj3@gmail.com, veera2u@gmail.com

Google Scholar URL of author1: https://scholar.google.com/citations?

user=BqHq3mAAAAAJ&hl=en&oi=sra

Google Scholar URL of author5: https://scholar.google.com/scholar?

hl=en&as_sdt=0%2C5&q=chaitanya+jannu&btnG=

Google Scholar URL of author6:

https://scholar.google.com/citations?user=q4wbCqsAAAAJ&hl=en&oi=sra

Abstract
Bullying refers to an unwanted behaviour by others that harms another physically,
mentally, or socially. Cyberbullying, also known as online bullying, includes textual or visual
bullying. There is a pressing need to detect cyberbullying in today's world, as the prevalence
of cyberbullying is growing, leading to mental health issues. Cyberbullying had previously
been detected using traditional machine learning algorithms. However, recent studies show
that deep learning outperforms traditional machine learning methods in detecting
cyberbullying for a few reasons, such as managing large amounts of data, effectively
categorising text and images, automatically extracting features by means of hidden layers, and
many more. This study examines the surveys that have already been conducted and points out
gaps in the research. In comparison with other models, we offer a deep learning-based hybrid
defence architecture LSTM-BiLSTM-GRU with attention mechanism for cyberbullying
detection that includes various deep learning-based frameworks and models as well as data
representation strategies. Finally, the method was assessed using popular performance metrics
like as F1-score, recall, accuracy, and precision. In comparison to the state-of-the-art, the
suggested method achieved its superior performance with an accuracy of 93.69%, which is
encouraging. The current DL-based method for detecting cyberbullying have been critically
examined, and their noteworthy contributions and suggested avenues for further research have
been noted.
Keywords: Cyber Bullying, Recurrent Neural Network, Long Short-Term Memory, Gated
Recurrent Unit.
1.Introduction
Cyberbullying, often known as cyber harassment, the bullying that takes place online [1].
These days, we can see several types of cyberbullying. Writing offensive language and
disseminating offensive images, like memes, are two examples. Social networking sites like
Facebook, Instagram, Twitter, and others have made it simpler for us to connect with people,
generate content, and communicate with others. Bullying on many social media platforms,
however, can result from unfiltered message content interchange and a lack of privacy
protection [2]. As by April 2024, the subsequent data show the most popular social networks
worldwide in terms of monthly active users. These channels have become essential for daily
communication, particularly among younger generations [3]. However, its broad use has
resulted in more incidents of cyberbullying. These data demonstrate the prevalence of
cyberbullying across multiple platforms. Furthermore, there have been worrisome reports of
cyberbullying contributing to teenage suicides, emphasising the critical need for better
detection and preventive techniques. Cyberbullies can take many different forms, such as
using flames, making hateful statements, sending inappropriate emails, publishing degrading
images, making cruel comments, and pestering people through blogs and social media.
Bullies can cause serious problems like depression, which can even lead to suicide [4,5]. It is
critical to identify cyberbullying to prevent the dangerous issue. Because there are no

Fig.1: Statistics of People on various social media [3]

measurable criteria or distinguishable features, it is challenging to detect cyberbullying. These
materials are unstructured, brief, noisy, and contain misspellings and symbols. To fool
automatic detection, users may purposefully obfuscate certain words or phrases within the
statement [6]. While most current approaches depend on supervised learning techniques,
researchers employ standard machine learning (ML) techniques to detect cybercrime [7].
Deep learning (DL)-based methods outperform classic machine learning (ML) models in
identifying cyber harassment since bully expressions are subjective [8].
According to a new study, DL models perform better than conventional ML algorithms when
it comes to identifying cyberbullying. This issue can be identified using deep neural
networks, including Recurrent Neural Networks (RNN), Gated Recurrent Units (GRU) [9],
Long Short-term Memory (LSTM) [10], Bi-LSTM [11], and several more DL models.
Several approaches depend on supervised learning techniques, researchers employ classical
machine learning (ML) techniques to detect cyberbullying (for example., both text and image
format). Because bully expressions are subjective, typical machine learning methods are less
effective at identifying cyber harassment.
There are various advantages to using DL-based models rather than conventional ones for
identifying cyberbullying. Numerous research [12,13,14,15] have demonstrated that DL
algorithms perform better than typical ML methods when the data quantity is big. It is time-
consuming and prone to mistakes to manually extract features for both visual and text
classification. While the operation is carried out dynamically within the hidden layers of DL-
based models, it is occasionally not reasonable to utilise typical ML models to extract
features. However, identifying cyberbullying in text and images requires the intelligent
extraction of features. Furthermore, the likelihood of delivering greater accuracy is increased
when one comprehends the setting of the text or visuals. The performance of machine
learning algorithms tends to deteriorate over time when handling complicated problems when
we have little domain knowledge [16].
Additionally, the adaptability and transferability of traditional machine learning models are
compromised. ML won't work as intended, for example, when we train a model on a clip
from YouTube dataset and then use it again on a Twitter dataset. When we come across
complicated verbal phrases like cyberbullying and harassment, DL models perform better
than ML models [12-16].
A example cyberbullying detection pipeline is depicted in Figure 2, with explanations of each
stage through social media data entry to cyberbullying detection. The input dataset for this
pipeline may include either text or image data that has been gathered from social media.
However, to improve the quality of the data, the raw text data is forwarded to data
preprocessing. The lower level of dimensionality is achieved by a variety of text preparation
techniques, such as data cleaning, tokenisation, stemming, lemmatisation, and stop word
removal. Following the preprocessing stage, feature extraction is done to turn the raw
information into numerical features that a machine can understand better.
Fig.2: Basic steps in Cyberbullying Detection
After the preprocessing phase, feature extraction transforms the raw data into numerical
features, which are more important in a machine learning model. After that, the findings are
sent through a cyberbully detection module, which use deep learning to find the offensive
material. Finally, cyberbullying content is divided into two categories: bully and non-bully.
Consider the scenario in which we must identify instances of bullying on social media sites
such as Twitter. A text-based method would look at tweets to find any words or phrases that
would suggest violence, harassment, or discrimination. Suppose we want to identify
incidences of cyberbullying via a photo-sharing website such as Instagram. An image-based
approach might use visual clues such as hate symbols, violent situations, or insulting gestures
to analyse image content and identify instances of cyberbullying.

1.1 The Contributions to this work include:

1. To confirm the validity of our dataset, we used web scraping methodologies to extract
cyberbullying-related language from online forums.
2. Deep sequential models are utilised to make predictions on bullying. We also employed
sentiment analysis to determine the meaning of phrases, sentences, and words.
3. Pre-processing stages include text purification, tokenisation, stemming, lemmatisation,
and the elimination of stop words.
4. The experimental results were generated using both individual deep learning models
(RNN, LSTM, GRU) and hybrid approach LSTM-BiLSTM-GRU-Attention mechanism.
5. An empirical analysis was conducted to examine the usefulness and effectiveness of deep
learning techniques in identifying abuses in social commentary.
6. Finally, a comparison of all DL and ML algorithm techniques was carried out. In
predicting abuses in Social Commentary, our hybrid DL model outperformed DL and ML
algorithms in terms of accuracy and F1-measure scores.
2. Related Work
Multiple investigations have used algorithms which employ deep learning and machine
learning to recognise cyberbullying in English. Notable works using Machine Learning as
well as Deep Learning for recognising cyberbullying are listed here.
By combining users' psychological characteristics with conventional text analysis, the study
improves Twitter's ability to detect cyberbullying. The technique increases detection accuracy
over conventional text-only methods by examining behavioural patterns, personality
attributes, and emotional tone. To identify tweets, machine learning models such as Support
Vector Machines are trained utilising both linguistic and psychological data. The findings
demonstrate that, in comparison to conventional techniques, this strategy greatly enhances
detection performance, resulting in better accuracy of 91.88% and less false positives.
Nevertheless, the approach has limitations as well, including the difficulty of precisely
determining the psychological characteristics of users and possible privacy issues [17].
The study includes multiple machine learning models, tested on a dataset of tweets as part of
the process. To improve accuracy, feature extraction techniques like TF-IDF and sentiment
analysis are used. One of the research's limitations is the difficulty in precisely defining
cyberbullying because of linguistic and contextual distinctions, which might cause non-
cyberbullying statements to be mistakenly classified as damaging. The authors also point out
that the changing nature of the languages used in digital communication may make it difficult
for conventional natural language processing methods to keep up. The findings show that the
suggested method can successfully categorise texts as either non-cyberbullying or
cyberbullying, increasing detection rates by combining many features including machine
learning techniques [18].
This study introduces a new deep learning structure for identifying internet harassment on
social media, with an emphasis on Twitter comments. Recognising the rise of undesirable
online behaviours such as bullying online, hate speech, and trolls, the study seeks to foster
more secure online environment. Popular cyberbullying datasets were pooled to train and test
the model, resulting in improved generalisation. The framework obtained roughly 93%
accuracy, exceeding current state-of-the-art approaches. This demonstrates the use of deep
learning in combating cyberbullying. Future work will entail adding new channels as well as
testing the algorithm with multilingual tweets [19].
The suggested approach combines the developed Bidirectional Encoder Representations
through Transformers (BERT) model with Deep Learning (DL) models to create a variety of
ensemble learning architectures. The DL models are created by combining
Bidirectional Long-Short Term Memory (Bi-LSTM) as well as Bidirectional Gated Recurrent
Unit (Bi-GRU) over GloVe and FastText word embeddings. Both these algorithms including
BERT are trained separately on a multi-label hostile dataset before being deployed together to
detect hate speech on social media. Thus, it shows that encoding texts utilising modern word
embedding approaches like as FastText and GloVe, in addition to Bi-LSTM and Bi-GRU, can
produce models that, when paired with BERT, can improve the ROC-AUC value [20].
Because of the enormous volume of data collected every day, manually detecting
cyberbullying is very difficult. To solve this problem, a few techniques have been used,
including machine learning, opinion mining, and the elimination of phoney profiles.
However, identification becomes more challenging because to the intricacy of social media
language and information. This paper examines deep learning models that are frequently used
to identify cyberbullying, includes hate speech, discrimination, and sexism, such as
convolutional neural network (CNN) models and LSTMs. The psychological effects of
cyberbullying are also highlighted in the paper, along with the necessity of utilising
multimedia data to enhance detection systems going forward [21].
This article examines methods for identifying cyberbullying that take temporal sensitivity into
account. It is essential to identify cyberbullying via social networks early on to lessen its
effects on victims. Two early detection models are used in our supervised learning approach:
dual, which necessitates two machine learning models, and threshold, which is simpler. This
happens to be the first study that investigates early detection for this situation. This analysis,
which employs a time-aware methodology and a real-world dataset, demonstrates a 42%
improvement above baseline detection algorithms [22].
To capture buried semantic information, this model suggests a novel method for word
embeddings applying the BERT pre-training model. Sentiment analysis and cyberbullying
detection are combined with a multitasking learning framework to improve accuracy, while
contextual features are extracted using a BiSRU++ model, which integrates attention
mechanisms. The suggested model can now identify possible online cyberbullying remarks
more easily because it is not solely dependent on a sensitive word lexicon, and testing
findings show that it can comprehend semantic information more effectively than standard
models [23].
This study tackles the problem of identifying cyberbullying on social media, especially in the
context comprising English, Urdu, and Roman Urdu—three languages that are frequently
used by communities of Urdu speakers. It draws attention to a weakness in current detection
techniques that ignore linguistic variances and the essential elements of cyberbullying,
including violence, repetition, and malicious intent. The study suggests an entirely novel,
annotated dataset that contains examples of cyberbullying for these three languages, with an
emphasis on the repeat of harmful behaviour and with the goal to cause harm. The study also
presents a detection method that uses new metrics for repetition with intent to hurt classify
text messages into aggressive or non-aggressive. The findings demonstrate that using the
optimised MuRIL model on this dataset.[24].
This study proposes an innovative and reliable methodology for identifying cyberbullying
using three modules that harvest data from various forms of social media content. The first
module uses bidirectional LSTM and attention approaches to extract significant features
within posts, whereas hierarchical attention networks examine post-comments at both the
word and comment levels. To encode meta-information like images and movies, the
framework uses a Multilayer Perceptron. The method was validated via data collected from
social media networks, confirming its usefulness. The study emphasises the necessity of
merging several data modalities for more effective cyberbullying identification, as well as the
need for in-depth examination of patterns and behaviours. The study emphasises the necessity
of merging many information types for more effective cyberbullying identification, as well as
the need for in-depth examination of patterns and behaviours. With the Twitter data set, model
outperforms existing techniques, with a precision about 92.1% and an F-value of 86.40%
[25].

3. Proposed Model (LSTM-BiLSTM-GRU-Attention Mechanism)

This Fusion model effectively classifies cyberbullying text by utilising the LSTM layer along
with the Bi-LSTM, and GRU layers. When hierarchically features are present in the dataset,
this model may be helpful. Additionally, the cooperation of the Bi-LSTM, with GRU layers
and attention mechanism strengthens the model's ability to comprehend the text in its entire
form. The different dependencies of the data can also be handled with the help of this model.
LSTMs represent a type of Recurrent Neural Network (RNN) that outperform RNNs in terms

salient features in a phrase based on their location. From Eq. (1), the representation of X i ∈
of data retention. RNN, on the other hand, can automatically learn features and recognise the

Rk indicates the word vector in k dimensions corresponding to each i th word within a sentence
of length (n)[26].

X (1, n) = X1 ⊕X2⊕ X3……⊕Xn ︀ 1

The sigmoid and the hyperbolic tanh functions generate feature maps from input sequences.
Cj = f (W * [ Xj, j+h−1] + BC) 2
In Eq. (2), It represents the jth feature importance,
Xi, i+h-1 is the word window, b is the bias term, whereas f is the activation function.
LSTMs solve the challenge of the vanishing gradient descent in traditional RNNs. LSTMs
are ideal for applications like text classification and predictive modelling due to their
enormous memory capacity. Such a network selectively determines which information must
be sent to further neurones and which may be forgotten nor omitted. These networks use
backpropagation using a gated mechanism. The following equations illustrate an input (IG t),
output (OGt), along with forget gate (FGt) that are fundamental parts of an LSTM network
[27].
The forget gate is used to control how cell-state information is extracted by applying a filter.
This stage removes any information that is unnecessary for the LSTM to comprehend, as well
as any less critical data. This is critical for optimising the LSTM model network output. H t−1
refers to the hidden state that the cell preceding it or the last cell's output, as well as X t is the
input for that time step. We employed weight matrices for multiplying the data provided, and
a bias was imposed. Following that, the value is added using the sigmoid function, resulting
in a vector matching with every value within the cell structure. The value ranges from zero to
one. Again, if '0' is assigned as the output value of the cell's state, the omitted gate would
prefer that the cell state cannot recognise the component of knowledge. By the same way, a
'1' indicates that the missing gate will naturally recall the entire bit of knowledge. Lastly, the
vector's output is multiplied by the cell state. Bidirectional LSTM (Bi-LSTM) is a reliable
approach for improving backpropagation for, Bi-LSTM can move both forward and
backward. A Bi-LSTM may process inputs both in reverse and serially. Architecturally, it
combines two LSTMs in opposite directions. This enables the network to recall information
from the past to the future through the forward layer along with the future to the past layer
using the backward LSTM layer.
FGt = σ (WFG *[ Ht−1, Xt] + BFG) 3
IGt = σ (WIG * [ Ht−1, Xt] + BIG) 4
CGt = tanh (WCG * [ Ht−1, Xt] + BCG) 5
CSt= FGt * CSt-1 + IGt * CGt 6
OGt = σ (WOG * [ Ht−1, Xt] + BOG) 7
Ht = OGt * tanh (CSt) 8
Where, FG – Forget Gate
IG – Input Gate
CG – Control Gate
OG – Output Gate
→ ←
The forward layer h calculates for a given series of inputs, whereas the backward layer h
calculates for a reverse sequence using Ht, where Ct is the activation function's vector. The
output of this model is given by the equation
Yt = yt-2, yt-1, yt, yt+1 …. yt+n-1, yt+n 9
→ ←
Where, yt = σ( h , h ) and σ is an operator for concatenation.
Fig. 3: Bi-LSTM Architecture [28]
Gated Recurrent Units (GRUs) are another sort of RNN that uses a gated approach to deal
with the vanishing as well exploding gradient problem. These outperform typical RNNs in
terms of testing accuracy due to their capacity for retaining long-term dependencies. GRUs
are a dynamic variant of LSTM networks that may update or reset memory cells. The
network functions as an update gate, combining input with the forget gate seen in LSTMs. In
addition, there exists a reset gate to refresh the memory contents. These are lightweight and
require less parameters over LSTMs. For an input vector Xt and time t, the update gate, reset
gate, hidden state and candidate hidden state equations are depicted as:
UGt = σ (WUG. (Ht-1, Xt)) 10
RGt = σ (WRG. (Ht-1, Xt)) 11
Ht = (1–UGt). Ht-1 +UGt. Ht’ 12
Ht’ = tanh (WH. (RGt. Ht-1, Xt)) 13
In the domain of text cyberbullying categorisation, a hybrid approach which combines the
benefits of LSTM, Bi-LSTM, along with GRU could address their shortcomings while
improving performance. This hybrid technique could use LSTM's pattern recognition
capabilities, Bi-LSTM's long-range dependency comprehension, and GRU's memory
efficiency to offer a comprehensive solution for effectively detecting possibilities of
cyberbullying in text. As a result, the GRU modules analyse the input data at each time step,
generating hidden states that include details about the input pattern. These hidden states are
subsequently passed into a fully connected layer, which generates the most accurate
predictions depending on the learnt weights. The predicted values are compared with the real
target labels,
Fig. 4: Proposed Hybrid Framework
and any errors are backpropagated to modify the weights, resulting in increased accuracy over
time. The Bi-LSTM layer, composed of both backward and forward-looking LSTM layers,
analyses the RNN-generated feature vector sequences, capturing the input data's long-term
relationships. A Bi-LSTM, unlike a conventional LSTM, analyses the input sequence both
forwards and backwards, allowing it to gather information from previous and future time
steps. The Bi-LSTM's hidden states are then sent over a fully connected layer, which
generates final predictions relying on the learnt weights. The predictions are compared with
the real target labels, and any errors are backpropagated to modify the weights, resulting in
increased accuracy over time.In this work, we suggest combining a Bi-LSTM network with a
stacked attention model. As the name suggests, the attention model [29] focusses on words
that are more important in the document. Figure 5 shows the proposed design, which
involves processing input through the Bi-LSTM network, passing it over an attention layer
with numerous neurones, and finally to the GRU layers. Understanding the context and
improving the final output allow the system to encode only selective valuable information.
This enables the model to work properly with suitably huge input texts. We use the multi-
head attention method introduced in [29]. The model gives non-zero weights for every input
items. We use the scaled dot product for the similarity function. To calculate the attention
score for a query, key-value pairs (K) are compared to the query to determine their
similarities. This is theoretically expressed by Equation 21. To determine the final attention
given by Equation (22), the weights are normalised using a SoftMax function, with d m serving
as the key dimension.
lx
14
Attention (Input, Set of Keys) = ∑ Similarity ¿ ¿) x Value j
j=1

Input . Key j 15
Similarity (Input, Key j) =
dm
T
IK 16
Attention (I, K, V) = SoftMax ( ¿V
√ dm
Where, Input = The relevant information required by the element.
Set of Keys = The complete set of keys and values.
Key = Elements in the complete set that are compared to the input.
Value = The information linked with each key contributes to the result.
4. Experimental set up
For effective model training and evaluation, this work makes use of Google Colab and
Jupyter Notebook, utilising Google's virtual GPU. On a Windows 11 computer, Python 3.12.4
was used for the implementation. The hardware that was used for the experiments has the
Core i7 processor, 10–15 GB of accessible storage, and 32 GB of RAM. For API queries to
run well, especially when accessing external data, a steady internet connection was necessary.
A variety of Python frameworks and libraries were used in the project, such as Scikit-learn to
machine learning tools, Pandas and NumPy to perform data manipulation, Matplotlib and
Seaborn to data visualisation, and TensorFlow and Keras to model construction. While GloVe
embeddings had been employed to improve the semantic comprehension of text data, NLTK
supported tasks related to natural language processing.
4.1. Classification Metrics
Accuracy: It evaluates the validity for the framework's predictions and is the most basic
statistic. It is calculated as the total number of estimations divided with the number of correct
predictions.
Precision: Determines the proportion of all positive estimations that are true positives. "How
many of all the events that were predicted as positive were actually positive?" is the question
it addresses.
Recall: The percentage of real positives within all actual positive cases is measured by recall
(sensitivity). It provides an answer to the query, "How many of the total actual positive items
were correctly predicted?"
F1-Score: The F1-Score is a balanced metric that considers both of which were false positive
and false negatives. It is calculated as the average harmonic of precision and recall.
AUC-ROC: AUC-ROC is the region beneath the operating characteristic curve of the
receiver. It represents the model's ability to distinguish between classes. A higher AUC-ROC
indicates better performance.
Confusion Matrix: A confusion matrix is an array of numbers that highlights what the model
predicts and shows the proportions of the actual positives, the actual negatives, incorrect
positives, and incorrect negatives. Compared to accuracy alone, it offers a more thorough
understanding of the model's performance.
4.2. Dataset Description
The dataset used for this study consists of 18,148 comments collected from various internet
venues. Based on sentiment analysis, 11,661 comments were labelled as negative, however
6,487 comments were classified as good.
The dataset has been collected from two main sources, YouTube Web Scraped Metadata:
Comments were retrieved through the YouTube Web Scraped Data API V3 with a confirmed
API key. Additional data was integrated from a freely accessed Kaggle dataset labelled
"Cyberbullying Classification" [31]. The dataset includes comments from Facebook, Twitter,
and YouTube. Evaluation of online interactions and behaviour, especially in the context of
cyberbullying and online aggression, was made easier by the sentiment classification, which
made it possible to separate comments into binary categories. Applying sentiment analysis
over the web-scraped data, the comments have been classified as either positive or negative.
5. Result and discussion
This study offered techniques for spotting online cyberbullying. The proposed process
comprises several processes, such as feature extraction to convert the text to numerical data,
effective text preparation to prepare the comments, and the application of many techniques to
classify and detect cyberbullying.
The model's overall performance shows good generalisation ability and efficient learning.
Both the accuracy of the validation and training curves show a steady increasing trend during
the training process, suggesting that the model is gradually picking up on the underlying
patterns in the data. The accuracy reaches a high level of stability about 93.69% after 15
epochs, indicating that the framework has achieved convergence.
Crucially, neither overfitting nor underfitting are clearly visible. Usually, overfitting occurs
when training accuracy keeps rising but validation accuracy falls; in this instance, both
measures rise simultaneously and stay very similar. This balance illustrates how well the
model generalises to new data. The framework is suitably complex and fully optimised to
capture the required characteristics from the input data, as evidenced by the consistently good
performance over both datasets, which also confirms the absence of underfitting.
Table 1. Comparative Analysis of the contemporary methods and the proposed approach
Model Accuracy F1 score Recall Precision

RNN 0.9006 0.90 0.90 0.90

LSTM 0.9049 0.90 0.90 0.90

GRU 0.9230 0.92 0.92 0.92

Bi-LSTM 0.9038 0.90 0.90 0.90

Traditional NN 0.8968 0.90 0.90 0.90

Logistic Regression 0.9148 0.91 0.91 0.92

Support Vector Machine 0.9280 0.93 0.93 0.93

Naïve Bayes 0.8797 0.88 0.88 0.88

Random Forest 0.8972 0.90 0.90 0.90

Decision Tree 0.8902 0.89 0.89 0.89

K-Nearest Neighbors (KNN) 0.6292 0.62 0.63 0.62

Proposed 0.9369 0.9509 0.9506 0.9513

LSTM+Bi-LSTM+GRU+Attention

Classification Metrics
0.9
0.6
0.3
0
N n ne s st e ) d
TM RU TM
N
sio hi ye re re N se
RN LS G S al N s c a o T N po
-L on re a eB F on (K
Pr
o
Bi iti Reg
orM aïv dom cisi ors
d t e
Tr
a
sti
c ec N
Ra
n D hb
o gi rtV eig
L o N
pp st
re
Su ea
-N
K

Accuracy F1 score Recall Precision

Fig. 5: Classification metrics for various approaches

Fig. 6: Training and Validation Accuracy & Loss curve for RNN Model

Fig. 7: Training and Validation Accuracy & Loss curve for LSTM Model
This finding is further supported by both the training as well as validation loss curves. A well-
converged model is characterised by a progressive flattening after a quick reduction in the
early epochs, as seen in both curves. The validation accuracy and loss of data show slight
variations at later epochs (e.g., epoch 7–10), but both are within expected ranges and most
likely the result of typical variation for mini-batch updates while training.

These data lead to the conclusion that the ideal training period for this model is 15 epochs.
Beyond this point, continuing training could result in diminishing returns and raise the
possibility of overfitting without appreciable performance improvement. The model's
robustness, stability, and suitability are confirmed by the training behaviour as seen by
accuracy and loss measures.
Fig. 8: Training and Validation Accuracy & Loss curve for GRU Model

Fig. 9: Training and Validation Accuracy & Loss curve for Bi-LSTM Model

Fig. 10: Training and Validation Accuracy & Loss curve for Traditional NN Model
Fig. 11: Training and Validation Accuracy & Loss curve for proposed model

Fig. 12: Confusion matrix for Logistic Regression and Support Vector Machine

Fig. 13: Confusion matrix for Naïve Bayes and Random Forest
Fig
. 14: Confusion matrix for Decision Tree and K-Nearest Neighbours (KNN)

Fig. 15: Confusion matrix for proposed technique

Table 2. Comparative Analysis of present work with prior research works

Accuracy
Authors Method
(%) F1-Score

Balakrishnan et al.
Machine learning techniques 91.88 N/A
[17]

Rizwan et al.[30] FastText and BERT 82 82

Al-Khasawneh et
Multi-modal approach 92.1 86.4
al.[25]
F. Razi and N.
m-BERT and MuRIL N/A 92
Eja[24]
LSTM-BiLSTM-GRU-Attention
Proposed 93.6 95.09
mechanism

6. Conclusion
With the intention to improve the effectiveness of cyberbullying detection systems, we
introduced a combination of deep learning architecture in this paper that combines LSTM, Bi-
LSTM, GRU, with an attention mechanism. We were able to create a strong and efficient
model by utilising the advantages of each element: the attention mechanism's focus on
pertinent characteristics, the computational efficiency of GRU, the bidirectional context
awareness of Bi-LSTM, and the LSTM's capacity to capture long-term dependencies. A more
sophisticated comprehension of the patterns of speech and contextual clues commonly present
in cyberbullying content is made possible by the merging of these layers. When compared to
standalone or conventional models, experimental results show that using this hybrid
architecture greatly increases detection accuracy. This method helps create safer and more
welcoming online environments in addition to advancing the area of automated online
harassment identification. For wider applicability, future research may examine further
optimization strategies, cross-lingual abilities, and real-time deployment.
Data Availability:
The dataset used in this study is available at Kaggle Community [31]. The description
of the dataset is also mentioned in the paper.
Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of
this paper.
References
[1] Feinberg, T.; Robey, N. Cyberbullying. Educ. Dig. 2009, 74, 26.
[2] Nikolaou, D. Does cyberbullying impact youth suicidal behaviors? J. Health Econ. 2017, 56, 30–46.
[3] Statista. Most popular social networks worldwide as of january 2025, ranked by number of monthly active
users, 2025.
[4] Brailovskaia, J.; Teismann, T.; Margraf, J. Cyberbullying, positive mental health and suicide
ideation/behavior. Psychiatry Res.2018, 267, 240–242.
[5] Lu, N.; Wu, G.; Zhang, Z.; Zheng, Y.; Ren, Y.; Choo, K.K.R. Cyberbullying detection in social media text
based on character-level convolutional neural network with shortcuts. Concurr. Comput. Pract. Exp. 2020, 32,
e5627.
[6] Arif, Muhammad. "A systematic review of machine learning algorithms in cyberbullying detection: future
directions and challenges." Journal of Information Security and Cybercrimes Research 4.1 (2021): 01-26.
[7] Hasan, Md Tarek, et al. "A review on deep-learning-based cyberbullying detection." Future Internet 15.5
(2023): 179.
[8] Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning
phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014,
arXiv:1406.1078.
[9] Fang, Y., Yang, S., Zhao, B., & Huang, C. (2021). Cyberbullying detection in social networks using bi-gru
with self-attention mechanism. Information, 12(4), 171.
[10] Gada, Mihir, Kaustubh Damania, and Smita Sankhe. "Cyberbullying Detection using LSTM-CNN
architecture and its applications." 2021 International Conference on Computer Communication and Informatics
(ICCCI). IEEE, 2021.
[11] Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015,
arXiv:1508.01991.
[12] Chen, Hsin-Yu, and Cheng-Te Li. "HENIN: Learning heterogeneous neural interaction networks for
explainable cyberbullying detection on social media." arXiv preprint arXiv:2010.04576 (2020).
[13] Caroppo, A.; Leone, A.; Siciliano, P. Comparison between deep learning models and traditional machine
learning approaches for facial expression recognition in ageing adults. J. Comput. Sci. Technol. 2020, 35, 1127–
1146.
[14] Yilmaz, A.; Demircali, A.A.; Kocaman, S.; Uvet, H. Comparison of Deep Learning and Traditional
Machine Learning Techniques for Classification of Pap Smear Images. arXiv 2020, arXiv:2009.06366.
[15] Finizola, J.S.; Targino, J.M.; Teodoro, F.G.S.; Moraes Lima, C.A.d. A comparative study between deep
learning and traditional machine learning techniques for facial biometric recognition. In Proceedings of the
Ibero-American Conference on Artificial Intelligence, Trujillo, Peru, 13–16 November 2018; Springer:
Berlin/Heidelberg, Germany, 2018; pp. 217–228.
[16] Picon, A.; Alvarez-Gila, A.; Irusta, U.; Echazarra, J. Why deep learning performs better than classical
machine learning? Dyna Ing.Ind. 2020, 95, 119–122.
[17] Balakrishnan, Vimala, Shahzaib Khan, and Hamid R. Arabnia. "Improving cyberbullying detection using
Twitter users’ psychological features and machine learning." Computers & Security 90 (2020): 101710.
[18] Perera, Andrea, and Pumudu Fernando. "Cyberbullying detection system on social media using supervised
machine learning." Procedia Computer Science 239 (2024): 506-516.
[19] Chatzakou, D., Leontiadis, I., Blackburn, J., Cristofaro, E. D., Stringhini, G., Vakali, A., & Kourtellis, N.
(2019). Detecting cyberbullying and cyberaggression in social media. ACM Transactions on the Web
(TWEB), 13(3), 1-51.
[20] Fahim, Kaji Mehedi Hasan, et al. Deep learning approaches for Bengali cyberbullying cetection on social
media: a comparative study of BiLSTM, BiGRU and BERT models. Diss. Brac University, 2023.
[21] Batani, John, et al. "A review of deep learning models for detecting cyberbullying on social media
networks." Computer Science On-line Conference. Cham: Springer International Publishing, 2022.
[22] López-Vizcaíno, Manuel F., et al. "Early detection of cyberbullying on social media networks." Future
Generation Computer Systems 118 (2021): 219-229.
[23] Xingyi, Guo, and H. Adnan. "Potential cyberbullying detection in social media platforms based on a multi-
task learning framework." International Journal of Data and Network Science 8.1 (2024): 25-34.
[24] F. Razi and N. Ejaz, "Multilingual Detection of Cyberbullying in Mixed Urdu, Roman Urdu, and English
Social Media Conversations," in IEEE Access, vol. 12, pp. 105201-105210, 2024, doi:
10.1109/ACCESS.2024.3432908.
[25] Al-Khasawneh, Mahmoud Ahmad, et al. "Towards Multi-Modal Approach for Identification and Detection
of Cyberbullying in Social Networks." IEEE Access (2024).
[26] Ombabi, A. H., Ouarda, W., & Alimi, A. M. (2020). Deep learning CNN–LSTM framework for Arabic
sentiment analysis using textual information shared in social networks. Social Network Analysis and Mining, 10,
1-13.
[27] Waqas, Muhammad, and Usa Wannasingha Humphries. "A critical review of RNN and LSTM variants in
hydrological time series predictions." MethodsX (2024): 102946.
[28] https://dagshub.com/blog/rnn-lstm-bidirectional-lstm/
[29] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I.
Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems
(NIPS 2017), Long Beach, CA, USA,4–9 December 2017.
[30] Rizwan, Hammad, Muhammad Haroon Shakeel, and Asim Karim. "Hate-speech and offensive language
detection in roman Urdu." Proceedings of the 2020 conference on empirical methods in natural language
processing (EMNLP). 2020.
[31] https://www.kaggle.com/datasets/shauryapanpalia/cyberbullying-classification

Deep Learning Algorithms For Cyber-Bulling Detection in Social Media Platfo 20250424 135605 0000
No ratings yet
Deep Learning Algorithms For Cyber-Bulling Detection in Social Media Platfo 20250424 135605 0000
8 pages
Cyber Bullying Detection Using Machine Learning
No ratings yet
Cyber Bullying Detection Using Machine Learning
4 pages
Research Paper3
No ratings yet
Research Paper3
9 pages
Abs 1
No ratings yet
Abs 1
2 pages
Batch-9 Paper
No ratings yet
Batch-9 Paper
8 pages
Survey Paper
No ratings yet
Survey Paper
8 pages
Paper 7
No ratings yet
Paper 7
13 pages
2020 Based On Deep Learning Architecture
No ratings yet
2020 Based On Deep Learning Architecture
14 pages
DL 8
No ratings yet
DL 8
5 pages
Articulo TTI FACPYA
No ratings yet
Articulo TTI FACPYA
15 pages
Cyberbullying Detection Using Machine Learning
No ratings yet
Cyberbullying Detection Using Machine Learning
6 pages
2022 Using Deep Transfer Learning
No ratings yet
2022 Using Deep Transfer Learning
19 pages
Paper 4
No ratings yet
Paper 4
5 pages
Machine Learning Based Cyber Bullying Detection
No ratings yet
Machine Learning Based Cyber Bullying Detection
5 pages
Yogeesh
No ratings yet
Yogeesh
9 pages
Predicting Cyberbullying in Social Media Using Machine Learning
No ratings yet
Predicting Cyberbullying in Social Media Using Machine Learning
7 pages
Paper 2
No ratings yet
Paper 2
6 pages
CBDPPT
No ratings yet
CBDPPT
25 pages
Blood Bank Management System
No ratings yet
Blood Bank Management System
20 pages
DL 4
No ratings yet
DL 4
10 pages
AI-Enabled User-Specific Cyberbullying Severity Detection With Explainability
No ratings yet
AI-Enabled User-Specific Cyberbullying Severity Detection With Explainability
25 pages
Cyberbullying Detection via Hybrid Deep Learning
No ratings yet
Cyberbullying Detection via Hybrid Deep Learning
13 pages
INTRO Merged
No ratings yet
INTRO Merged
52 pages
Irjet V7i12375
No ratings yet
Irjet V7i12375
15 pages
Cyberbullying Detection with AI
No ratings yet
Cyberbullying Detection with AI
13 pages
Detection and Classification of Cyberbullying Using CR
No ratings yet
Detection and Classification of Cyberbullying Using CR
8 pages
Icest Journal Paper
No ratings yet
Icest Journal Paper
12 pages
DL 3
No ratings yet
DL 3
9 pages
Machine Learning for Cyberbullying Detection
No ratings yet
Machine Learning for Cyberbullying Detection
9 pages
Cyberbullying Paper
No ratings yet
Cyberbullying Paper
9 pages
CBDA Research Paper
No ratings yet
CBDA Research Paper
29 pages
Automated Detection of Cyber Bullying
No ratings yet
Automated Detection of Cyber Bullying
3 pages
Empowering Online Safety A Machine Learning Approach To Cyberbullying Detection
No ratings yet
Empowering Online Safety A Machine Learning Approach To Cyberbullying Detection
5 pages
CBDA Research Paper
No ratings yet
CBDA Research Paper
19 pages
Cyberbullying IPR
No ratings yet
Cyberbullying IPR
25 pages
Cyberbullying
No ratings yet
Cyberbullying
1 page
Semantic Autoencoder for Cyberharassment Detection
No ratings yet
Semantic Autoencoder for Cyberharassment Detection
21 pages
Cyberbullying Detection Model Using LSTM
No ratings yet
Cyberbullying Detection Model Using LSTM
9 pages
Major Project Detailed Report
No ratings yet
Major Project Detailed Report
50 pages
Cyber Bullying Detection
No ratings yet
Cyber Bullying Detection
5 pages
Deep Learning for Cyberbullying Detection
No ratings yet
Deep Learning for Cyberbullying Detection
8 pages
Advanced Cyberbullying Detection A Hybrid Model Integrated With Nave Bayes
No ratings yet
Advanced Cyberbullying Detection A Hybrid Model Integrated With Nave Bayes
5 pages
Cyberbullying
No ratings yet
Cyberbullying
18 pages
Cyberbullying Detection Based On Semantic-Enhanced Marginalized Denoising Auto-Encoder PDF
No ratings yet
Cyberbullying Detection Based On Semantic-Enhanced Marginalized Denoising Auto-Encoder PDF
12 pages
BERT-Based Cyberbullying Detection
No ratings yet
BERT-Based Cyberbullying Detection
9 pages
Cyberbullying Detection via ML
No ratings yet
Cyberbullying Detection via ML
16 pages
Cyberbullying Detection Using NLP (r3) - 1
No ratings yet
Cyberbullying Detection Using NLP (r3) - 1
45 pages
Nasimuzzaman (M230105004 Project)
No ratings yet
Nasimuzzaman (M230105004 Project)
32 pages
Paper 13
No ratings yet
Paper 13
8 pages
A Comprehensive Review On Cyberbullying Prevention
No ratings yet
A Comprehensive Review On Cyberbullying Prevention
7 pages
The Use of Arduino Interface and Date Palm (Phoenix Dactylifera) Seeds in Making An Improvised Air Ionizer-Purifier
No ratings yet
The Use of Arduino Interface and Date Palm (Phoenix Dactylifera) Seeds in Making An Improvised Air Ionizer-Purifier
7 pages
DL 5
No ratings yet
DL 5
7 pages
Early Detection of Cyberbullying On Social Media Networks
No ratings yet
Early Detection of Cyberbullying On Social Media Networks
11 pages
Paper Final
No ratings yet
Paper Final
8 pages
Word Embedding For Detecting Cyberbullying Based On Recurrent Neural Networks
No ratings yet
Word Embedding For Detecting Cyberbullying Based On Recurrent Neural Networks
9 pages
Cyberbullying Detection with BERT
No ratings yet
Cyberbullying Detection with BERT
5 pages
JES 2 Sandip+Bankar 6 2241
No ratings yet
JES 2 Sandip+Bankar 6 2241
9 pages
OrorJai Sri Gurudevoror
No ratings yet
OrorJai Sri Gurudevoror
19 pages
Adbm Mid-2 QB
No ratings yet
Adbm Mid-2 QB
30 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
MST Mid-2 QB
No ratings yet
MST Mid-2 QB
22 pages
Mean Stack Technologies - Lab Record
No ratings yet
Mean Stack Technologies - Lab Record
6 pages
Machine Learning (Unit-5) Machine Learning (Unit-5) : Scan To Open On Studocu Scan To Open On Studocu
No ratings yet
Machine Learning (Unit-5) Machine Learning (Unit-5) : Scan To Open On Studocu Scan To Open On Studocu
11 pages
Machine Learning Unit 4 Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4 Machine Learning Unit 4
29 pages
ML Lab Exp-3
No ratings yet
ML Lab Exp-3
5 pages
Machine Learning - Lab Record
No ratings yet
Machine Learning - Lab Record
43 pages
Read Me First
No ratings yet
Read Me First
1 page
BSE2701 System Design - HD2 15-16 - HVAC Assignment
No ratings yet
BSE2701 System Design - HD2 15-16 - HVAC Assignment
2 pages
Deepak Neethimani: J2EE Architect Resume
No ratings yet
Deepak Neethimani: J2EE Architect Resume
4 pages
BI & Data Visualization Course
No ratings yet
BI & Data Visualization Course
57 pages
Licensing Guide Acronis Cyber Protect Cloud en US 241010
No ratings yet
Licensing Guide Acronis Cyber Protect Cloud en US 241010
56 pages
Java Lab Manual
No ratings yet
Java Lab Manual
100 pages
OSMU User Guide (EulerOS, x86 Rack Server)
100% (1)
OSMU User Guide (EulerOS, x86 Rack Server)
916 pages
Design and Analysis of Assembly of Piston, Connecting Rod and Crank Shaft
No ratings yet
Design and Analysis of Assembly of Piston, Connecting Rod and Crank Shaft
8 pages
2808ICT Assignment 2 Comment
No ratings yet
2808ICT Assignment 2 Comment
3 pages
Scrum Test 8
100% (3)
Scrum Test 8
43 pages
Introduction To XML
No ratings yet
Introduction To XML
44 pages
Resume Digambar
No ratings yet
Resume Digambar
1 page
JavaScript Obfuscation Techniques
No ratings yet
JavaScript Obfuscation Techniques
17 pages
UdhayakumarG Resume
No ratings yet
UdhayakumarG Resume
1 page
Zelio2 PDF
0% (1)
Zelio2 PDF
234 pages
CSE202 Viva Questions
No ratings yet
CSE202 Viva Questions
4 pages
DB2 Advance Performance Monitoring
100% (1)
DB2 Advance Performance Monitoring
47 pages
IT9000-PV6500 User Manual-EN
No ratings yet
IT9000-PV6500 User Manual-EN
38 pages
Icm 75 Otb
No ratings yet
Icm 75 Otb
218 pages
History of The World Wide Web
No ratings yet
History of The World Wide Web
9 pages
Agile in Software Projects & Healthcare
No ratings yet
Agile in Software Projects & Healthcare
8 pages
Addis Ababa Science AND Technology University
No ratings yet
Addis Ababa Science AND Technology University
9 pages
Transformers in Medical Imaging - A Survey
No ratings yet
Transformers in Medical Imaging - A Survey
40 pages
User Name & Password: Student-Name User-Name Password Grade Roll No. Category School Name-Campus-Address School-Code
0% (1)
User Name & Password: Student-Name User-Name Password Grade Roll No. Category School Name-Campus-Address School-Code
1 page
Fall 2023 - IT430 - 1
No ratings yet
Fall 2023 - IT430 - 1
2 pages
CV - Serban - Stefan Gabriel Dec 2023
No ratings yet
CV - Serban - Stefan Gabriel Dec 2023
7 pages
PHP Notes
100% (1)
PHP Notes
97 pages
CDB620-001 Sensor Module Specs
No ratings yet
CDB620-001 Sensor Module Specs
5 pages
220-702 Comptia A+
100% (3)
220-702 Comptia A+
321 pages
Software Requirements Specification - Payment Gateway
No ratings yet
Software Requirements Specification - Payment Gateway
4 pages

Cyberbullying Detection

Uploaded by

Cyberbullying Detection

Uploaded by

AI- Driven based Protection: Cyberbullying Detection using Hybrid Deep

Google Scholar URL of author1: https://scholar.google.com/citations?

Google Scholar URL of author5: https://scholar.google.com/scholar?

Google Scholar URL of author6:

Fig.1: Statistics of People on various social media [3]

1.1 The Contributions to this work include:

3. Proposed Model (LSTM-BiLSTM-GRU-Attention Mechanism)

X (1, n) = X1 ⊕X2⊕ X3……⊕Xn ︀ 1

RNN 0.9006 0.90 0.90 0.90

LSTM 0.9049 0.90 0.90 0.90

GRU 0.9230 0.92 0.92 0.92

Bi-LSTM 0.9038 0.90 0.90 0.90

Traditional NN 0.8968 0.90 0.90 0.90

Logistic Regression 0.9148 0.91 0.91 0.92

Support Vector Machine 0.9280 0.93 0.93 0.93

Naïve Bayes 0.8797 0.88 0.88 0.88

Random Forest 0.8972 0.90 0.90 0.90

Decision Tree 0.8902 0.89 0.89 0.89

K-Nearest Neighbors (KNN) 0.6292 0.62 0.63 0.62

Proposed 0.9369 0.9509 0.9506 0.9513

Accuracy F1 score Recall Precision

Fig. 5: Classification metrics for various approaches

Fig. 15: Confusion matrix for proposed technique

Rizwan et al.[30] FastText and BERT 82 82

You might also like