DL 3
DL 3
ISSN (2210-142X)
Int. J. Com. Dig. Sys. , No. (Mon-20..)
http://dx.doi.org/10.12785/ijcds/XXXXXX
Abstract: Cyberbullying is a serious concern in today’s digital age. The rapid increase in the use of social media platforms has
made cyberbullying even more prevalent. The form of cyberbullying has also evolved with time. In the era of Web 1.0, cyberbullying
was limited to text-based data, but with the advent of Web 2.0 and 3.0, it has expanded to images and multi-modal data. Detecting
cyberbullying in text-based data is relatively easy as various natural language processing techniques (NLP) can be used to identify
offensive language and sentiment. However, detecting cyberbullying in image-based data is a major challenge as images do not have
a clear textual representation. Hence, bullies often try to bypass existing cyberbullying detection techniques by using images and
multi-modal data. We proposed a deep learning technique named as CNBD-Combinational Network for Bullying Detection (CNBD),
which is a combination of two networks: Binary Encoder Image Transformer (BEiT) and Multi-Layer Perceptron (MLP) network.
To improve the performance of the CNBD, we supplied two additional input factors to the CNBD using concepts called Image
Captioning(IC) and OCR (Optical Character Recognition) to extract text overlayed on the images. The experimental results proved the
two additional factors gave an advantage to the CNBD technique in terms of accuracy, precision, and recall.
Keywords: Cyberbullying, Social media, Multi-Layer Perceptron, Deep learning, OCR, Image Captioning.
• Proposed a CNBD technique for Cyberbullying (CB) posts and comments that contain cyberbullying. The authors
detection in images. evaluate the performance of the system on a dataset of
22,899 Instagram posts and comments, manually annotated
• Fine-tuned the transformer-based network, BEiT to as cyberbullying or non-cyberbullying. The obtained results
our downstream task using Cyberbullying image believe the proposed model has a superior accuracy rate of
dataset. 91.4% in detecting cyberbullying incidents on Instagram.
Elmezain Mahmoud [7] proposed a hybrid classification
• Fine-tuned VGG16 and LSTM (Long Short Term model based on transformers and SVM to predict whether
Memory) architectures with MS-COCO dataset to bullying takes place or not. Using the proposed combined
generate image captions. models with the SVM classifier, the authors claim to have
• The Multi-Layer Perceptron network is built to im- achieved an accuracy rate of 96.05%. Furthermore, the
prove the performance of the model. proposed model has a 99% classification accuracy for the
bullying class and a 93% accuracy for the non-bullying
The rest of the paper has been organized as follows. The class. The study highlights the negative impact of bullying
related works on cyberbullying in images have been dis- on students’ academic performance and the importance of
cussed in section 2. The main contribution of this proposed taking appropriate action against anti-bullying and raising
research as the proposed CNBD technique is presented community awareness of the problem. The authors sug-
in Section 3. The results and discussion are illustrated in gested that future works will focus on using Twitter texts
section 4. Finally, we have concluded with possible future with Google form questionnaires for classifying cyberbul-
enhancements in continuing this work are given in section lying and how to stop it.
5.
Rui Cao et al. [8] proposed a model, PromptHate, that
2. Related Works uses pre-trained RoBERTa language models and constructs
In this section, we presented past research works related simple prompts to prompt the model for hateful meme clas-
to cyberbullying on image data. P.K Roy and Mali[4] sification. To make use of the latent knowledge in the Pre-
proposed a transfer learning-based automated model for cy- trained Language Models, the authors present real-world
berbullying detection in images from social media networks. examples. The model’s performance is measured against
The proposed model extracts hidden features from cyberbul- state-of-the-art baselines, and the findings demonstrate that
lying. The experiments were carried out with two different it excels with an AUC of 90.96 on two publicly available
datasets of 1000 images and 3000 images. They consider datasets. Agarwal et al. [9] presented two approaches to
three deep learning models for cyberbullying detection in identifying hate memes using deep learning techniques. The
images: 2-dimensional CNN, Visual Geometry Group 16 first method incorporates features from several modalities,
(VGG16), and InceptionV3. Among the three models, the while the second employs sentiment analysis based on
Inception V3 outperforms in terms of precision (87 image captioning and text placed on the meme itself. These
methods use a trifecta of deep learning algorithms—GloVe,
Homa Hosseinmardi et al.[5] present a novel approach encoder-decoder, and OCR using the Adamax optimizer.
to detecting cyberbullying incidents on the Instagram social We utilize the Facebook Challenge Hateful Meme Dataset,
network. The authors propose a system that utilizes machine which includes over 8,500 meme images, to test the meth-
learning algorithms to automatically classify Instagram ods. Facebook uses both methods in the ongoing challenge
posts as either cyber bullying or non-cyberbullying. The competition, and they both show promise on the validation
system uses a combination of text and image features dataset. K R Prajwal et al.[10] proposed a novel method
extracted from the posts to train a classification model. The to capture the image content on social media images. They
authors evaluate the system’s performance on a dataset of implemented a two-stage approach for image caption for
10,000 Instagram posts manually labeled as cyberbullying images. In stage-1, emotional representations are captured
or non-cyberbullying. The results show that the system using Transfer Learning and in stage-2, facial emotions are
achieves a high accuracy (92%) in detecting cyberbullying extracted using encoders which are extracted from stage1.
incidents on Instagram. The paper also provides a detailed
analysis of the features that contribute most to the classifi- T Tiwary et al. [11] have introduced the Automatic
cation performance of the system. The authors believe that Image Captioning (AIC) technique to help visually impaired
their approach can be used to develop effective tools for consumers identify products in online grocery stores. To
combating cyberbullying on social media platforms. solve this problem, they proposed an ECANN (Extended
Convolutional Atom Neural Network). For caption extrac-
Haoti Zhong et al.[6] proposed a content-based approach tion from e-commerce image data, the ECANN model
for detecting cyberbullying on the Instagram social network. combines the LSTM architecture and CNN. On the Grocery
The authors developed a system that uses NLP and ML Store Dataset, the proposed ECANN model achieved an
techniques to analyze the textual content of Instagram posts accuracy of 99.46%, and on the Freiburg Groceries dataset,
and comments. The system uses a set of features such it achieved an accuracy of 99.32. Al-Malla et al. [12]
as sentiment, emotion, and content similarity to identify proposed an attention-based encoder and decoder for an
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. , No. (Mon-20..)) 191
Vijaya kumar et al. CNN, ReLU activation function Custom based CNN used for feature extraction Datasets: NSFW and SFW Metrics: 82% accuracy
Hoati Zhong et al. Latent Dirichlet Allocation, pre-trained CNN,SVM classifier feature extraction using image captioning 3000 images collected from Instagram. Metrics: 68.55% accuracy
P.K Roy and F U Mali 2D-CNN,Transfer learning using VGG16, InceptionV3 Unable to predict textual bullying detection in images Created two dataset of sizes 1000 images and 3000 images Metrics: 87% f1-score
Homa Hosseinmardi et al. N-gram , SVM classifier Confined to instagram images only Created 998 media sessions. Metrics: 87% accuracy, Precision 88%,Recall 87%
Mahmoud Elmezain et al. Hybrid model, SVM classifier Unable to predict textual bullying detection in images Created 1200 images dataset. Metrics: 96.05% accuracy
Nishanth Viswamitra et al. Multimodal classifier low level image features, manual features selection Created 19000 images dataset, Metrics: 93.46% accuracy, Precision 94.27%, Recall 96.93%
of the contents of an image. A RoBERTa [22] architecture Here y is the input to the sigmoid function and ‘e’ is Euler’s
was employed to generate text features of the Image caption
from the image.
C. Text extraction from Images
After that, we employed the Tesseract API[23] to extract
text overlayed on images. The extracted text is passed to
RoBERTa to generate text features of the extracted text.
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. , No. (Mon-20..)) 195
Figure 9. Original image is converted into Patches. Figure 11. Example of Text extraction from Input Images using
OCR.
stage hence the images had different formats of sizes
and structures. In the data preprocessing state we have
applied data augmentation techniques of Normalization and In similar way, the input image passed to VGG16 and
Resizing and Rescaling. Finally, all the input images are set LSTM for image captioning. Figure 10 shows the caption
to a height and weight of 224*224 pixels size with RGB captured by the network for given input images. Similarly,
channels (Red, Green, and Blue). the input image is passed to OCR for text extraction on
the images. Figure 11 shows the text extracted from input
images using OCR.
f1-score of the proposed technique with existing models. [3] G. Mishna, Mona Kassabri and Joanne, “Risk factors for involve-
ment in cyberbullying: Victims, bullies, and bully-victims.” Children
The accuracy and loss functions concerning the epochs and Youth Services Review, vol. 70, pp. 274–282, 2016.
are shown in Figure 13. It can be observed that the loss
associated with train data is lower compared to the loss of [4] P. Roy and F. Mali, “Cyberbullying detection using deep transfer
learning,” Complex Intelligent Systems, vol. 8, pp. 5449–5467,
test data as expected since the test data is not seen during 2022.
the training phase. It can also be observed that the accuracy
showed a positive trend as we increase the training epochs. [5] H. Hosseinmardi, S. A. Mattson, R. I. Rafiq, R. O. Han, Q. Lv,
This demonstrates the adaptive nature of the network to our and S. Mishra, “Detection of cyberbullying incidents on the insta-
downstream task. gram social network.” Association for the Advancement of Artificial
Intelligence, 2015.
5. Conclusion and Future Enhancements
[6] H. Zhonga, H. Li, A. Squicciarini, R. majer, C. Griffin, Miller,
As the usage of social media platforms continues to and CorneliaCaragea, “Content-driven detection of cyberbullying on
grow, so too does the prevalence of negative online be- the instagram social network.” In Proceedings of the Twenty-Fifth
haviors like cyberbullying, online hate speech, and trolling. International Joint Conference on Artificial Intelligence (IJCAI’16).,
Consequently, there is a growing need to explore effective vol. 8, p. 3952–3958, 2016.
ways to detect and address these harmful activities. One
[7] E. Mahmoud, A. Malki, IbrahimGad, and E.-S. Atlam, “Hybrid deep
important aspect of this is the detection of cyberbullying learning model–based prediction of images related to cyberbully-
on social media, which presents a particular challenge due ing.” International Journal of Applied Mathematics and Computer
to the diverse forms in which it can manifest, including text, Science, vol. 32, pp. 324–333, 2022.
images, and multimedia content. We proposed a technique
named CNBD for cyberbullying detection in images. The [8] C. Rui, Lee, R. Ka-Wei, C. Wen-Haw, and J. Jiang, “Prompthate:
proposed technique was evaluated using the metrics of Prompt-based hateful meme classification with pre-trained language
models.” Proceedings of the 2022 Conference on Empirical Methods
accuracy, precision, and recall. The experimental results in Natural Language Processing, pp. 321–332, 2022.
show that the proposed method with Image Caption features
and OCR text features can improve results compared to the [9] Aggarwal, T. Sharma, Yadav, Agrawal, Singh, Mishra, and Gritli,
existing techniques with an accuracy of 98.23%, precision “Two-way feature extraction using sequential and multimodal ap-
of 98.05%, and recall score of 98.05%. As part of the proach for hateful meme classification,” IEEE Access, vol. 9, pp.
121 962–121 973, 2021.
future, we consider cyberbullying detection for multi-media
data such as text and images, videos, and regional language [10] K. R. Prajwal, C. V. Jawahar, and P. Kumaraguru, “Towards in-
specific such as Telugu, Tamil, and Hindi. creased accessibility of meme images with the help of rich face
emotion captions,” Proceedings of the 27th ACM International
A. Authors and Affiliations Conference on Multimedia, p. 202–210, 2019.
The template is designed so that author affiliations are
not repeated each time for multiple authors of the same af- [11] T. Tiwary and R. P. Mahapatra, “An accurate generation of im-
age captions for blind people using extended convolutional atom
filiation. Please keep your affiliations as succinct as possible neural network.” Multimedia Tools and Applications, vol. 82, p.
(for example, do not differentiate among departments of the 3801–3830, 2022.
same organization).
[12] Al-Malla, Jafar, and Ghneim, “Image captioning model using at-
References tention and object features to mimic human image understanding,”
[1] S. Hinduja and J. W. Patchin, “Bullying, cyberbullying, and suicide,” Journal of Big Data, vol. 9, 2022.
Cyberbullying Research Center, vol. 14, pp. 206 – 221, 2010.
[13] Efrat, I. Malkiel, and L. Wolf, “Caption enriched samples for
[2] S. A. L. M. Kowalski RM, Giumetti GW, “Bullying in the digital improving hateful memes detection.” Conference on Empirical
age: A critical review and meta-analysis of cyberbullying research Methods in Natural Language Processing, 2021.
among youth,” Psychological Bulletin, vol. 140, pp. 1073 –1137,
2014. [14] Y. Zhou and Z. Chen, “Multimodal learning for hateful memes
http:// journals.uob.edu.bh
Int. J. Com. Dig. Sys. , No. (Mon-20..)) 197
detection,” IEEE International Conference on Multimedia & Expo [25] X. Glorot and Y. Bengio, “Understanding the difficulty of training
Workshops (ICMEW), vol. 8, pp. 1–6, 2020. deep feedforward neural networks,” Proceedings of the thirteenth
international conference on artificial intelligence and statistics,
[15] P. Lyu, C. Zhang, S. Liu, M. Qiao, Y. Xu, L. Wu, K. Yao, J. Han, vol. 8, pp. 249–256, 2010.
E. Ding, and J. Wang, “Maskocr: Text recognition with masked
encoder-decoder pretraining,” ArXiv, vol. abs/2206.00311, 2022.
http:// journals.uob.edu.bh