Deep Acoustic Modelling For Quranic Recitation - Current Solutions and Future Directions
Deep Acoustic Modelling For Quranic Recitation - Current Solutions and Future Directions
net/publication/382181317
Deep Acoustic Modelling for Quranic Recitation – Current Solutions and Future
Directions
CITATIONS READS
0 168
3 authors, including:
All content following this page was uploaded by Muhammad Aleem Shakeel on 12 July 2024.
Abstract: The Holy Quran has the utmost new options have opened to improve our
importance for the Muslim community, and to get a comprehension of Quranic recitation and bring it
full reward, the Quran should be read according to to a broader audience using machine learning
the rules mentioned. In the past few years, this and deep learning.
field has gained a lot of importance in the eyes of
Because of its utmost significance in
researchers who aim to automate the Quranic
reading and understanding process with the help academics and research, researchers have
of Machine Learning and Deep Learning, knowing continually investigated new approaches to
it has a lot of challenges. To date, there are a lot of improve the Quranic recitation experience and
research categories explored. However, still, there support its accurate preservation and meanings.
lacks a few holistic, including one detailed survey The use of deep acoustic modeling techniques
of all the categories and methodologies used to for Quran recitation is one topic that has recently
solve problems. We focused the paper on being a attracted an immense amount of attention.
one-stop-shop for the people interested so they Speech recognition, natural language processing,
could find (i) all related information and (ii) future
and deep learning have succeeded in many
gaps in research. This paper provides a detailed
survey on Deep Modeling for Quranic Recitation to areas. In speech recognition and voice analysis,
address these challenges. We discussed all deep acoustic modeling uses complex neural
possible categories of speech analysis, including networks for processing and evaluating acoustic
the most advanced feature extraction techniques, data. Researchers aim to better understand the
mispronunciation detection using Tajweed rules, art of "Tilawah" by using these cutting-edge
Reciters and speech dialect classification, and techniques for Quranic recitation. They also seek
implementation of Automatic Speech Recognition to increase the accuracy and robustness of
(ASR) on Quranic Recitations. We also discussed current Quranic recitation recognition systems.
research challenges in this domain and identified
This review aims for some understanding of the
possible future gaps.
advancements made in the fields of Automatic
Index Terms: Speech Analysis, Feature Extraction, Speech Recognition (ASR), contextualized
Mispronunciation Detection, Tajweed, Reciter classification in Quranic topics, improvements in
Classification, Automatic Speech Recognition, the reading of the Quran using Deep learning,
Deep Learning feature extraction techniques and their
comparisons, reciter classification, Tajweed,
Hijayah, Makhraj correction and classification
1. INTRODUCTION
based on the context as well.
61
Highlights the kind of evidence that researchers should
2022 (3) Speech Genres in the Quran focus on while investigating genres and explains mistakes
and problems that should be considered.
NLP serves as a synthesis compendium of works that
2023 (4) NLP for Quranic Research span speech recognition-based Qur’anic recitation
correction to computerized morphological evaluation
evaluate the efficiency and performance of thoroughly evaluated the literature pertinent to
deep learning models for Quranic the current ontology models to spread an
recitation assessment compared to more accurate understanding of the Quran utilizing
conventional methods. semantic technologies.
Highlight challenges in existing State-of- Wahdan et al. (2) presented a text classification
the-Art research and propose new for the Quran and Arabic language using deep
directions in this domain. learning models. They have focused on text
classification techniques based on deep learning,
1.1 Existing Reviews including CNN, RNN, LSTM, etc. They have
Table 1 presents a comprehensive overview of thoroughly analyzed system models of 12
existing review papers in the domain of Quranic research papers related to the topic along with
knowledge, with a focus on topics such as their accuracies and results, showing which
ontology methods, text classification, speech model could work best for the purpose. They
genres, and the application of Natural Language have also suggested which models to use to
Processing (NLP). Notably, these surveys improve text classification.
broadly cover Artificial Intelligence without a Devin et al. (3) presented different approaches
specific concentration on a singular research to finding different speech dialects in Quranic
domain. Through our critical analysis, we Recitation. They have provided readers with
identified a significant research gap, particularly basic guidelines for interpreting Qur'anic
the lack of emphasis on acoustic modeling. passages, emphasized the kinds of evidence that
This observed gap was a primary motivation for researchers should concentrate on while looking
undertaking the current survey paper. Our into genres/dialects, and discussed errors and
objective was to address this limitation by pitfalls that should be considered in subsequent
focusing exclusively on the realm of acoustic studies. Authors have suggested that distinctive
modeling and its application in the recitation of words, phrases, and structures must all be
the Quran. By narrowing our scope to target carefully examined. Their debate emphasizes
researchers actively engaged in the workings of how specific dialect texts are incorporated into
acoustic modeling and Quranic recitation, we Surahs or more extended parts within Surahs. It
aimed to provide a specialized and in-depth demonstrates how the Qur'an references pre-
exploration of this specific research domain. existing categories and alters and transforms
During our literature review within this focused them.
domain, we observed a scarcity of recent and up- Huzaifa et al. (4) presented how NLP can be
to-date surveys. Existing surveys in this area used for Quranic research while focusing on
were either outdated, unable to capture the Quranic commentaries and exegesis. They have
emergence of recent works, or failed to address induced NLP with speech recognition to improve
evolving issues within the field. This realization Quranic recitation and showed that NLP methods
further fueled our motivation to contribute a timely aid in creating tools that make it easier for regular
and comprehensive survey that not only bridges people to learn new things. Their studies provide
the identified gap but also offers insights into the an overview of the many Qur'anic NLP initiatives
latest developments in the field of acoustic and serve as a synthesis compendium of works
modeling for Quranic recitation. spanning the spectrum from automated
. State-of-the-art research investigates how to morphological examination to speech
deal with this domain, as Rusli et al. (1) recognition-based Qur'anic recitation correction.
presented a semantic ontology for Quranic A comprehensive review of Deep Modeling for
knowledge. They have delivered a detailed Quranic recitation will help the research
systematic review of how available Quranic community understand these concepts. However,
ontology models are limited to domains like the surveys mentioned in Table 1 still lag and
nouns, subjects, pronouns, antonyms, and face challenges researchers must investigate.
Islamic knowledge in the Quran because they do However, these surveys mainly focused on NLP,
not account for all concepts in the Quran. To give Text Classification, and Semantic Ontology.
an in-depth evaluation of this field, the research That's where this paper comes in
seeks to find relevant research works from
various electronic data sources. Their study
62
Figure 1: Article Organization for Literature Review of State-of-the-Art Research
to emphasize acoustic modeling and speech only “audio samples” for training of
analysis in Quranic recitation. The sole focus of Quranic data, which ultimately means
this review paper is to find research gaps and that the survey paper is specifically for
challenges the research community faces in the target audience who are interested in
speech recognition and Deep Learning modeling. audio data training of Quran.
We categorize the paper based on
1.2 Scope and Contribution
different schemas and techniques such
The goal of this systematic review is to as Correct Recitation analysis, Tajweed,
thoroughly evaluate the existing research using Makhraj, Hijayah, Imlaah, Automatic
deep learning methods in the field of Quranic Speech Recognition, and some other
recitation. Significant contributions are made to categories that create a single one-stop-
deep acoustic modeling for Quranic recitation. shop for the future researcher to study
The following are just a few of the significant the relevant SOTA topic they are
contributions: interested in and search the findings of
We present a thorough and integrated last ten years.
overview of the current state-of-the-art We identify the research gaps in every
deep learning techniques to analyze single paper mentioned in this survey
Quranic recitation by methodically and help future researchers define their
studying a wide range of research future research ideas in the specific
articles. domain of acoustic modelling. Ultimately,
We discover new patterns and prospective we conclude with a research gap that
fields for further research through hasn’t been touched in this domain and
analysis. Future researchers will gain can create a new research stream and
insightful knowledge from this change the research trend for future
assessment of research gaps, which will researchers.
help them develop state-of-the-art We mentioned the research papers that
research ideas. answer the following questions:
The methodology, statistics, and Have they come up with a new
assessment metrics used in the papers acoustic modeling technique?
are rigorously evaluated. This Is there a data set available
assessment will aid in comprehending online for future researchers?
the benefits and drawbacks of various Do they compare their
approaches and offer suggestions for techniques with the rest of the
enhancing research methodologies. algorithms?
The following points suggest a need for a new
1.3 Article Organization
survey or review despite numerous survey
papers in the same field by various authors. This manuscript classifies different research
Our survey paper was focused on defining topics for organizing Quranic recitation, as shown
the techniques and methodology used by in Figure 1, and conducted a thorough survey on
different authors for a specific field called each topic. The article's organization is divided
“Deep Acoustic Modelling for Quranic into the following categories: Section 2 defines
Recitation” which eventually means that deep acoustic modeling for Quranic Recitation.
we focused on the SOTA papers that use Section 2.1 describes the primary feature
63
extraction techniques used for the classification Arabic recited words from 6 different speakers,
and their results to prove which state-of-the-art making the testing dataset 480 Arabic words. The
process works best for the related topic, along segmentation accuracy achieved by the system
with their survey and results comparisons. is about 90%. For future work, more constraints
Section 2.2 represents the different reciters' could be added to the MFCC peaks to make
classification techniques used by the researchers them more efficient, or different feature extraction
to classify the reciters' voices and is currently an techniques, such as spectral envelopes and
active research topic. Section 2.3 represents a formant frequencies, can be implemented
thorough Tajweed, Hijayah, and Makhraj independently or combined to increase the
classification and correction of mistakes model's efficiency.
techniques using deep learning to improve the Even though The Holy Quran consists of the
reading of the Quran without any errors. Section same verse all over the world, the recited poem
2.4 represents the Automatic Speech probably is different from the other person who
Recognition used for Quranic Recitation, repeated the same verse because of the distinct
including Hybrid HMM BLSTM-based modeling voice of every person. Bezoui et al. (6) proposed
and End-To-End Transformer modeling, and a technique to train and test the Arabic speech
Section 2.5 describes the classification of system using the KALDI toolkit. The author
different artifacts of Quran recitation, including explored the viability of different feature-
Feature Identification on both acoustic and extracting methods for developing the system to
textual data and classifying different Maqams of extract important features from Quranic
Quranic Recitation. Recitations, including the Mel-Frequency
Cepstral Coefficient (MFCC). The author
explained in detail the MFCC technique, including
2. BACKGROUND AND LITERATURE ON MODELING
all steps, such as preprocessing, framing,
OF QURANIC RECITATION
windowing, etc. The maximum efficiency
Recitation is more than just reading the context achieved for the system is 75% using the
of the Quran; it follows standards and rules Hamming Window technique and 55% for the
guiding pronunciation, rhythm, and melody. Deep rectangular window. The dataset used for the
neural networks, such as convolutional neural purpose includes audio files of Quranic Verse.
networks (CNNs) or recurrent neural networks However, the work can be improved by
(RNNs), are trained to recognize and capture the implementing the fixed-range sliding window
complex auditory patterns unique to Quranic technique and extracting the MFCC feature for
recitation in deep acoustic modeling for Quranic every windowing signal.
recitation. Recording Arabic phonemes precisely Meftah et al. (7) compared different feature
and following Tajweed guidelines for proper extraction techniques to achieve the highest
pronunciation is necessary. The model also accuracy for Arabic Phonemes classification. For
emphasizes intonation and melody to simulate this purpose, a dataset corpus has been created
pitch fluctuation and some syllables' lengthening for other Arabic recitations of the Holy Quran,
in Quranic recitation accurately. Deep acoustic and then acoustic features are extracted from it.
modeling for Quranic recitation's ultimate goal is These features include Mel Frequency Cepstral
to develop a technologically enhanced tool that Coefficient (MFCC), Linear Predictive Coding
makes it easier to learn and master the complex (LPC), Perceptual Linear Prediction (PLP), Mel-
art of recitation. filter bank coefficient (MELSPEC), Log Mel-filter
bank coefficient (FBANK), and Linear Prediction
2.1 Feature Extraction Reflection Coefficients (LPREFC). After feature
Raw audio signals are frequently extraction, Hidden Markov Model (HMM)
multidimensional and packed with information. classification has been used. The result shows
While maintaining crucial qualities necessary for that FBANK and MELSPEC features produce the
the application, feature extraction assists in highest accuracy for the system, i.e., 85.38% and
reducing the dimensionality of the data. 83.37%. Also, the MFCC and PLP results are
Abdo et al. (5) presented an algorithm for close to this accuracy, while LPC is unsuitable for
automatically segmenting between emphatic and Arabic speech recognition.
non-emphatic Arabic speech from Arabic audio Adiwijaya et al. (8) did a comparative analysis
signals. The study mainly focused on the to classify the pronunciation of Hijaiyyah Letters
recitation principles of the Holy Quran. The using different feature extraction techniques. The
methodology was to extract important features of dataset includes audio samples of Hijayyah
Arabic sound signals using the Mel Frequency letters and was analyzed using Mel Frequency
Cepstral Coefficient (MFCC), find the peak cepstral coefficient (MFCC) and Linear Predictive
position for boundaries in the target signal, and Coding (LPC) and then classified using KNN. The
then evaluate the whole system on medium-level proposed system was compared with Principal
speech. They have created the database for all Component Analysis (PCA) and without PCA. It
these signals and evaluated the system on 80 showed that LPC-KNN proved to be a better
Arabic-recited words. The dataset consists of Hijayah classification technique with an accuracy
64
of 78.92% compared to 59.87% conducted by the speaker using Bidirectional Long Short-Term
MFCC-KNN. Hence, LPC was verified to be a Memory (BLSTM), which proved to be a better
better feature extraction technique. and less computationally expensive technique for
speaker identification. Gunawan et al. (13)
2.2 Reciters Classification
developed an identification system for Quran
Classification of reciters for Quranic recitation reciters using MFCC and GMM. The dataset
aims to recognize and classify various Qaris used for the purpose includes the Quranic
(reciters) according to their distinctive recitation recitation of 5 reciters and randomly selected
styles. This is a particular endeavor in the field of verses of the Quran. Around 15 audio samples of
audio analysis. each reciter have been collected and analyzed
Khan et al. (9) proposed a machine learning using the Mel Frequency Cepstral coefficient and
approach to recognize the reciter of the Holy then classified using the GMM classifier. The
Quran. The dataset of 12 different reciters proposed system achieved 100% accuracy in
reciting the last ten surahs of the Quran has been identifying the reciter. Also, the system can reject
used, which means the model has 12 classes to unknown reciters rather than these five.
classify. Two types of approaches have been However, the system can be extended by
used for audio representation. First is feature including variations of recited verses from
extraction using MFCC and the pitch of the different reciters on different Quranic Surahs.
sound. The second is auto correlograms of audio
spectrograms. Then, implement Naive-Bayes, 2.3 Correct Recitation Analysis
J48, and Random Forest for classification. Naive- Makhraj, Hijayah, and Tajweed deep learning-
Bayes and Random Forest achieved the based Quran correction is a novel and
maximum accuracy of 88%. technologically advanced method for improving
Munir et al. (10) proposed another feature Quranic text comprehension and recitation. For
extraction technique for Speaker Identification in non-Arabic-speaking Muslims, reading the Quran
Quranic Surah. To extract essential features, the is always a challenging task. Since many words
author used a combination of Discrete Wavelet in the Quran are written differently than they are
Transform (DWT) and Linear Predictive Coding read, To help parents solve the reading and
(LPC) and then performed classification using pronunciation problems of dyslexic children,
Random Forest (RF). The system achieved the Basahel et al. (14) proposed a technique for
maximum accuracy of 90.90%. However, to developing an application in Android for
improve the identification accuracy, they tried supporting adaptive learning and self-paced
feature extraction techniques to be used one at a learning. The application could convert the voice
time and combined to train the classification recognition algorithm into text to support E-
model. But it does not affect the accuracy of the learning and was only limited to processing single
classifier. The dataset used for the purpose words, not complete sentences or texts.
includes Arabic recitation of the Holy Quran. The However, the application can be improved to train
research can be continued by implementing more on texts and sentences. Ahmad et al. (45)
than two feature extraction techniques and then suggested a method to identify mispronunciation
classifying using machine learning techniques. in Tajweed rules using Mel-Frequency Cepstral
However, the system is trained for Arabic Coefficient (MFCC) features with Long Short-
recitation only. The work can be extended by Term Memory (LSTM) neural networks that use
introducing recitation in different languages as the time series. The QDAT dataset is available to
well. Elnagar et al. (11) proposed a supervised the public and includes over 1500 voices reciting
learning-based classification technique to classify the three Tajweed rules. They compared the
the reciters in the Quran audio dataset. The LSTM model with the traditional ML algorithms.
system can identify the exact or closest reciter The LSTM model outperformed formal machine
using machine learning techniques. The system learning with time series. LSTM's accuracy on the
was used to extract perceptual features from QDAT dataset for the three rules was 96%, 95%,
these audio data, which include the pitch, the and 96%, respectively.
tempo, short-time energy, etc. Then, it The correct pronunciation and recitation of the
implemented the support vector machine (SVM) Quran mainly depend on these four ideas:
classifier. The model achieved an accuracy of
2.3.1 Tajweed
90%. The dataset used for the purpose includes
Quranic audio of 7 reciters from Saudi Arabia, Tajweed is a collection of guidelines for
which specifies that the model can work on the pronouncing and articulating Quranic texts
Arabic dialect of Quranic speech. The work can correctly. It ensures every letter is said precisely,
be improved by implementing other feature melodiously, and with the right rhythm, intonation,
extraction techniques or using some and elongation.
combinations. Qayyum et al. (12) proposed a Ahsiah et al. (15) proposed a system for
deep learning technique using Arabic audio checking Tajweed and correcting the recitation of
signals for speaker identification. The audio the Holy Quran. The Tajweed, a set of guidelines
signals were analyzed and classified based on for Al-Quran recitation, ensures correct
65
pronunciation, readings, and text interpretations. samples from four expert readers and achieved a
Religious teachers with experience have validation accuracy of 90.8%.
traditionally imparted this knowledge. These Rajagede et al. (19) proposed a system to help
instructors typically pay attention to the students' users memorize the Quran without the help of a
recitations and point out any errors. The second reciter. He proposed a based system that
traditional approach, which calls for the presence verifies the input recitation with the existing
of these qualified professors, has limitations in Quranic data. Manhattan LSTM network was
enabling a self-learning environment. To assist used to verify the recitation and give the output in
students in learning and practicing accurate Al- a single numerical data if the recitation was
Quran recitation independently, the author similar or not, and the Siamese classifier gave
suggested a Tajweed rule-checking system binary classifier output. They also compared
employing speech recognition technology. The different feature techniques for the
proposed method can detect and highlight preprocessing, including delta features, Mel
discrepancies between student and experienced Frequency Cepstral Coefficient (MFCC), and Mel
teachers' recitations that are recorded in a Frequency Spectral Co- efficient (MFSC) for
database. The system utilized the MFCC better model performance. The dataset used for
algorithm to extract features and HMM for the purpose includes data from Quranic Ayah in
classification. databases. The highest accuracy achieved by the
Yosrita et al. (16) compared different methods system is 77.35% using MFCC and Manhattan
of Tajweed used in the recitation of Al-Quran and LSTM. However, in the future, to achieve better
extracted their features using MFCCs. Altalmas accuracy, it is recommended to use a deeper
et al. (17) proposed a technique for correct Siamese LSTM model or an attention-based
recitation of the Quran according to the Tajweed model and use more data for training.
rules. The words from the same point of To improve the Quranic Recitation system,
articulation proved to have less similar distances Alqadasai et al. (20) proposed a Phoneme
or more matching sounds than words from classification system for the correct recitation of
different parts of articulation. To confirm this, the the Quran. The dataset consisted of 21 aayahs of
author analyzed the sound of words Y and I using the Quran recited by 30 reciters. The dataset was
the Mel Frequency Cepstral Coefficient (MFCC) analyzed and trained using an HMM-based ASR
and then compared them using the Dynamic model. The system is optimized by the duration
Time Warping (DTW) technique to find the integrated into Quranic phoneme classification.
similarities and differences. The scope of this The system achieved an accuracy ranging from
technique is limited to two Quranic words (Y and 99.87% to 100% for phoneme classification.
I) only. However, the content of the work can be However, the proposed methodology does not
extended by increasing the size of the dataset cover all the issues of recitation, so the extended
and by adding all Quranic words to find version of the model could use more datasets to
similarities and differences between all words. cover all the Quranic recitation and Tajweed
Classic Arabic is very hard for non-native issues.
Arabic speakers, which makes it difficult for them Omran et al. (21) proposed a deep learning-
to recite the Holy Quran. Short vowels in the based approach to correctly understanding
Arabic language play an essential role in the Tajweed rules for the Holy Quran. Reading the
correct Tajweed. Alqadheeb et al. (18) proposed Holy Quran precisely as it was read by The Holy
a methodology for correcting Tajweed using an Prophet (PBUH) is challenging. The author used
audio dataset of Arabic words, including short the dataset of Quranic Audio from different Arabic
vowels. The complete dataset includes 2892 reciters and focused on the letters on which
Arabic short vowels and 84 classes. Then, the Qalqala rules are applied. Mel Frequency
preprocessing techniques and CNN are used for Cepstral Coefficients (MFCC) were used for
classification and testing. The model was tested feature extraction, and then Convolutional Neural
on 312 phonemes of the Arabic language using Networks (CNN) based model was used for
"ALIF" as a word and achieved an accuracy of classification. The author tends to achieve the
100%. However, the system was de- signed to maximum validation accuracy of 90.8%.
work on a single phoneme, "ALIF." In the future, To encourage the Muslim community to read
we can extend the research to all the Arabic the Holy Quran with correct Tajweed and without
phonemes and train the model on them. Omran any recitation error, Ahmad et al. (22) proposed a
et al. (40) implemented Tajweed rules for method to classify two ways of Tajweed, i.e.,
correctly reciting and understanding the Quran. Musyafahah and Talaqqi, using Artificial Neural
They analyzed the Arabic Alphabet's five sukun Networks and Digital Signal Processing
vowelized letters (Baa, Daal, Jeem, Qaaf, and techniques. The dataset includes audio files of
Taa), subject to the Qalqalah rule. They used the ldghaam with correct recitation and false
Convolutional Neural Networks (CNN) model for recitation. For the preprocessing of audio files,
recognition and the Mel Frequency Cepstral the Mel Frequency cepstral coefficient technique
Coefficients (MFCC) as the feature extraction has been used to extract essential features and
method. The dataset contains 3322 audio then classify them using three different ANN
66
classifiers, including Levenberg- Marquardt (25) addressed recognizing the nine-point
optimization, Resilient Backpropagation, and recitation articulations using the Speech analysis
Gradient Descent with Momentum. The highest technique. Professionals in a controlled
accuracy achieved by the system was 77.7 for environment recorded the dataset of Quranic
the Levernberg Marquardt algorithm. The system Makhraj letters. The audio samples underwent
can be improved if more classes of Tajweed preprocessing using five feature extraction
methods are added. Also, the dataset used for techniques: MFCC, Mel spectrogram, Tonnetz,
the process is small; higher accuracy can be Spectral contract, and chroma. Three methods,
achieved by increasing the audio files for every ANN, KNN, and SVM, were used for
class. classification. The system achieved an overall
accuracy of 56% using the ANN technique. The
2.3.2 Hijayah
system can be improved using Deep learning
Hijayah involves making sure that the written techniques such as CNN and RNN instead of
script of the Quran corresponds to the intended traditional ANN, KNN, and SVM. Farooq et al.
pronunciation during recitation. (26) proposed a deep learning-based model for
Marlina et al. (23) proposed the machine detecting mispronunciation in the Quran. The
learning technique for Makhraj recognition of purpose of the system was to automate the
Hijayah letters. For this purpose, the author used manual teaching method of the Quran, which
the Mel Frequency cepstrum coefficient (MFCC) typically requires a teacher or instructor. The
for feature extraction of audios, and then SVM dataset consists of Arabic audio recitation of
was used to classify Hijayah letters. The dataset Quranic words, and the RASTA PLP technique
used for the purpose includes audio files of was used for feature extraction. The system was
Hijayah letters. However, the result of the system trained using the Hidden Markov Model (HMM).
can be improved using deep learning techniques The recognition rate achieved using RASTA PLP
such as ANN or CNN for the classification of is 85%. The system has been extended to build a
Makhraj. real-time application. However, the plan was
Correct pronunciation and writing of the initially limited to a single word of Arabic
Hijaiyah Letter in the Holy Quran are important phonemes, and it can be further developed to
for Muslims to follow the reading rule according include Quranic Ayahs and different Tajweed
to the rule of The Holy Prophet (PBUH). Irfan et rules, including Qalqala rules, Tanween, etc.
al. (24) suggested a Dynamic Time Warping
technique to find the difference between written 2.3.4 Imlaah and Iqlaab
and uttered letters. The dataset consisted of Imlaah describes the lengthening of particular
image files of Quranic letters and sound files of Arabic letters inside a word. In contrast to Iqlaab,
Quranic letters. To process image files, Principal which involves changing a specific letter, Noon,
Component Analysis (PCA) and Mel Frequency into a different sound (namely, the sound of
Cepstrum Coefficient (MFCC) were used for "Meem") when followed by the Arabic letter "Ba,"
sound data. Both results were shown as when a reciter encounters a letter with an Izhaar
numerical values, and then Euclidian distance (clear pronunciation) diacritic, such as "Alif,"
was applied to find the difference between them. "Waw," or "Yaa." The specific phonetic properties
For image matching, the accuracy achieved for of these letters and how they interact in some
the system was 92.85%, and for sound matching, word combinations cause these changes to
the accuracy achieved was about 71.42%. In the happen.
future, some other methods could be used for Yousfi et al. (27) proposed a technique to
processing, such as Linear Discriminant Analysis, recognize Imlaah rules for Quranic Recitation.
Edge Matching, etc., to achieve higher accuracy Correct recitation includes Tajweed rules, which
for the system. The application is limited to are essential when studying the Quran. For this
Hijaiyah Letters only, and we can extend it to purpose, the author used a dataset of proper
words or phrases to match the difference Quranic audio recitation of verses and
between complete phrases. preprocessed it using feature extraction
techniques such as the Mel Frequency Cepstral
2.3.3 Makhraj
Coefficient (MFCC). Hidden Markov Models
Makhraj describes the posture of the tongue, (HMM) were employed for classification and
lips, and vocal cords at the point of articulation for compared with correct recitation and recitation
each Arabic letter. Proper Makhraj is essential for collected in real-time available in the database.
precise pronunciation. The accuracy achieved by the system ranges
Correct pronunciation of Hijayah letters in the from 68% to 85%. Yousfi et al. (28) proposed
Quran is crucial. It can confuse the user when Iqlaab checking the rules of the Quran. For this
pronouncing similar-sounding letters, and purpose, the author built a speech recognition
incorrect pronunciation may alter the word's system to recognize, identify, and point out the
meaning. The reciter should have an excellent wrong rules/mismatches of Iqlaab rules in
knowledge of the differences between Hijayah recitation. The dataset was preprocessed using
letters. For this purpose, Wahidah Arshad et al. the feature extraction technique Mel Frequency
67
Cepstral Coefficient (MFCC), and for feature Markov Model (HMM)-Gaussian Mixture Model
classification, Hidden Markov Models (HMM) (GMM). However, there is a disadvantage in
were used. The system achieved a maximum generalizing high-variance and solving non-linear
accuracy of 70%. separable datasets. For these problems, Thirafi
et al. (29) proposed a new approach for training
2.4 Automatic Speech Recognition (ASR)
the acoustic models of the Arabic language using
To fully understand the delicate nature of the Deep Learning techniques. The author used
sacred texts of the Quran, Automatic Speech Bidirectional Long-Short Term Memory (BLSTM)
Recognition (ASR) is essential. The approach of combined with Hidden Markov Models (HMM) to
carefully preserving the phonetic intricacies and build a hybrid system. The system proved to
rhythm, particularly the recitation style of Quranic show good results for the Arabic language
verses, is called acoustic modeling. ASR for compared to the HMM-GMM model. For HMM-
Quranic recitation bridges the gap between GMM, the Word Error Rate (WER) was 18.39%,
spoken word and digital representation, whereas for the proposed technique, WER was
preserving the authenticity of the recitation while reduced to 4.63%. The author also analyzed the
also opening the door to cutting-edge model of different Quranic styles. The system
applications that allow for correct transcription, was trained on Quranic recitation by professional
analysis, and distribution of the Islamic message. reciters only.
Most state-of-the-art research on acoustic
signals for Arabic languages uses the Hidden
Table 2: Detailed review of research papers that shared dataset and system model
and compare their model results with other techniques.
System
Title Ref Dataset Speech/Text Comparison
Model
MFC peak-based segmentation for continuous Arabic audio
(5) × Speech ×
signal
68
Isolated Iqlab checking rules based on the speech
(28) × Speech ×
recognition system
The model could be extended by introducing non- a user-friendly web-based application for the
professional reciters for better system transcription of recitation words.
performance. Additionally, the method used
2.5 Other Categories
QScript for the transcription system, which was
unsuited for the Quran. A better transcription The research of various recitation styles, the
system can be employed. Siregar et al. (41) categorization of alphabets, the use of text
created a system using wavelet signal extraction mining tools, the classification of maqams, and
and ANFIS to classify Tajwid rules to handle context-aware analysis are a few intricate
voice input and recognize Al-Quran reading elements that make up the study of the Quran.
through recitation. Data collection, audio pre- Investigating the many ways to recite the Quran's
processing, wavelet packet extraction, splitting verses, each infused with unique rhythms and
training and test data, and classification are the accents that communicate significant meanings,
steps in the process. Twenty observations were is necessary to distinguish between different
obtained from ten observations that were pre- styles of recitation.
processed. Six primary features and 64 rules are Ousfi et al. (31) proposed a technique to
obtained from the wavelet decomposition method distinguish between different types of recitation of
as ANFIS input variables. Subsequently, the data the Holy Quran, given the diversity in Qira'at
was divided into three testing observations and worldwide. The dataset was created using
seventeen training observations. The WPANFIS recitations of other students and expert teachers.
classification model achieved 100% correct The Mel Frequency Cepstral Coefficient (MFCC)
classification with SSE values matching the feature extraction technique was implemented,
training result of 0.00081225. and the Hidden Markov Model (HMM)
Hadwan et al. (30) proposed an end-to-end classification was applied. The system can also
system for Arabic ASR. They built an acoustic detect mismatches in the recitation type.
model using attention-based encoder-decoder Khairuddin et al. (35) developed an automated
techniques with deep learning. Mel filter has been system for students to practice reciting the
used for feature extraction, and RNN and LSTM Quran, focusing on the "ro" alphabet. Formant
have been employed to build a sizable Arabic analysis, Mel Frequency Cepstral Coefficient
language model for the Quran. The dataset (MFCC), and Power Spectral Density (PSD) were
consisted of speech data from 60 reciters and used for feature extraction of Quranic recitation.
textual Quranic data. The language model has Quadratic Discriminant Analysis (QDA) and
been trained on textual data. The model achieved Linear Discriminant Analysis (LDA) were used for
a character error rate of 1.9% and a word error classification. The system achieved a maximum
rate of 6.1%. However, the model can potentially accuracy of 95.8% with all 19 training features in
perform better by investigating better-proposed repetition and 82.1% accuracy in the learning
systems for large datasets or by implementing a phase. Mahmudin et al. (43) compared how well
transducer model instead of a transformer model the two types of models performed when
for encoding and decoding using the employed categorizing Quran verses according to their
model. Ghori et al. (42) use Mel frequency auditory similarities. The first model, Model B,
cepstral coefficients and deep neural networks to employs the MaLSTM architecture and MFCC
construct an Arabic isolated voice recognition features. The second model, Model C, is just
system for the Holy Quran's vocabulary. The Model B plus more delta features. The dataset
suggested system can recognize individual words includes 172,895 Al-Quran recitation sound
from a recited verse with adequate accuracy. It samples from Juzz 30, comprising 37 surahs and
employs 14 hours of audio data to showcase the 564 verses. The training model used was
system's functional prototype and focuses on the DeepSpeech, supported by TensorFlow. Thirty
362 unique words found in the first and last 19 percent of the samples were used as a validation
chapters of the Holy Quran. They also developed set during the model training procedure. The
results show that Model B, equipped with the
69
MFCC feature, performs optimally when were used to assess the performance of the
identifying and categorizing audio-based Quran suggested model. The outcomes were 2.42%
verses. The employment of the delta feature in CER and 8.34% WER.
Models B and C negatively impacts model
performance.
Nur et al. (32) developed an automated 3. OPEN RESEARCH CHALLENGES IN MODELLING
OF QURANIC RECITATION
interpretation classification into two classes,
including "Tafsir Bil Ra'yi" and "Tafsir Bil Ma'tsur." Building a robust, deep acoustic model
The KNN algorithm was used for variety and specifically for Quranic recitation involves various
achieved an accuracy of 98.12%. Modified KNN complex issues requiring careful attention. The
and Fuzzy KNN were used for further challenges in building the model are as follows,
comparison, with MKNN proving the best as shown in Table 3:
algorithm.
Shahriar et al. (33) proposed a system to 3.1 Diverse Arabic Dialects
classify eight recitation styles (Maqamat), Accurate transcription of pronunciation and
including Tajweed using deep learning intonation is complex due to the large variety of
techniques. The dataset included audio recitation Arabic dialects and geographical differences. The
of the Quran by different reciters in different intricate tapestry of many Arabic dialects and
styles. The system achieved the highest geographical accents must be considered while
accuracy of 95.7% using a 5-layer ANN network building a robust, deep acoustic model for
trained on 26 input features. However, the Quranic recitation. This problem requires a
system could be improved by increasing the thorough strategy incorporating the subtle
dataset using more reciters and implementing phonetic variations in different recitation
bidirectional LSTMs. traditions. It takes careful language study and
Moulay et al. (36) built an application framework contextual adaptation to accommodate this
for a context-based search of any Quran chapter dialectal diversity while staying true to the
or verse, which also applies to voice search. The traditional recitation methods. Saddat et al. (37)
voice-search feature allows users to search using utilized Naive Bayes classifiers, and the
their voice. The dataset consisted of Quranic character n-gram Markov language model has
chapter audios recited by 36 reciters, including achieved 98% accuracy on 18 different Arabic
eight famous interpretations and four translations dialects.
of the Holy Quran. However, the framework has 3.2 Noise and Audio Purification
some limitations, including notifications based on
locations using GPS systems that are lacking Background noise can reduce the accuracy of
entirely. The framework could be improved by the model's output in audio recordings. The
including Hadith knowledge that allows users to difficulty of maintaining the original quality of
search for Islamic teachings. Ahmad et al. (44) recordings of Quranic recitation arises from the
suggested that the model consists of a character- widespread presence of noise inside audio data.
based beam search decoder and a CNN- Advanced noise reduction techniques must be
Bidirectional GRU encoder that uses CTC as an used to capture deep vocal expressions while
objective function. They used the recently successfully overcoming audible interference. It
released public dataset Ar-DAD, which consists is a complex but crucial endeavor to balance
of around 37 chapters repeated by 30 reciters eliminating noise and preserving the evocative
using various pronunciation standards and details of the recitation.
recitation speeds. Word error rate (WER) and Almisreb et al. (34) proposed removing noise
character error rate (CER), the two most widely while recording recitation. The research
used assessment metrics in speech recognition, evaluates the effectiveness of the
70
necessitating a flexible model.
To apply augmentation to the filter bank coefficient
Data Augmentation and Data augmentation is necessary to directly, utilize warping the features, masking
(39)
Training include diverse recitation conditions. blocks of frequency channels, and masking blocks
of time steps.
71
[2] A. Wahdan, S. Hantoobi, S. A. Salloum, and K. Shaalan, [18] F. Alqadheeb, A. Asif, and H. F. Ahmad, "Correct
"A systematic review of text classification research pronunciation detection for classical Arabic phonemes
based on deep learning models in Arabic language," Int. using deep learning," in 2021 International Conference
J. Electr. Comput. Eng, vol. 10, no. 6, pp. 6629-6643, of Women in Data Science at Taif University (WiDSTaif).
2020. IEEE, 2021, pp. 1-6.
[3] D. J. Stewart, "Approaches to the investigation of speech [19] R. A. Rajagede and R. P. Hastuti, "Al-Quran recitation
genres in the Qur'an," Journal of Qur'anic Studies, vol. verification for memorization test using siamese LSTM
24, no. 1, pp. 1-45, 2022. network," Communications in Science and Technology,
[4] M. H. Bashir, A. M. Azmi, H. Nawaz, W. Zaghouani, M. vol. 6, no. 1, pp. 35-40, 2021.
Diab, A. Al-Fuqaha, and J. Qadir, "Arabic natural [20] A. M. A. Alqadasi, M. S. Sunar, S. Turaev, R.
language processing for Quranic research: A systematic Abdulghafor, M. S. Hj Salam, A. A. S. Alashbi, A. A.
review," Artificial Intelligence Review, vol. 56, no. 7, pp. Salem, and M. A. Ali, "Rule-based embedded HMMs
6801-6854, 2023. phoneme classification to improve Quranic recitation
[5] M. S. Abdo, A. H. Kandil, and S. A. Fawzy, "MFC peak recognition," Electronics, vol. 12, no. 1, p. 176, 2022.
based segmentation for continuous Arabic audio signal," [21] D. Omran, S. Fawzi, and A. Kandil, "Automatic detection
in 2nd Middle East Conference on Biomedical of some Tajweed rules," in 2023 20th Learning and
Engineering. IEEE, 2014, pp. 224-227. Technology Conference (L&T). IEEE, 2023, pp. 157-160.
[6] M. Bezoui, A. Elmoutaouakkil, and A. Benihssane, [22] F. Ahmad, S. Z. Yahya, Z. Saad, and A. R. Ahmad,
"Feature extraction of some Quranic recitation using "Tajweed classification using artificial neural network," in
mel-frequency cepstral coefficients (MFCC)," in 2016 5th 2018 International Conference on Smart
International Conference on Multimedia Computing and Communications and Networking (SmartNets). IEEE,
Systems (ICMCS). IEEE, 2016, pp. 127-131. 2018, pp. 1-4.
[7] A. Meftah, Y. A. Alotaibi, and S.-A. Selouani, "A [23] L. Marlina, C. Wardoyo, W. M. Sanjaya, D. Anggraeni, S.
comparative study of different speech features for Arabic F. Dewi, A. Roziqin, and S. Maryanti, "Makhraj
phonemes classification," in 2016 European Modelling recognition of hijaiyah letter for children based on mel-
Symposium (EMS). IEEE, 2016, pp. 47-52. frequency cepstrum coefficients (MFCC) and support
[8] M. N. Aulia, M. S. Mubarok, W. U. Novia, F. Nhita, et al., vector machines (SVM) method," in 2018 International
"A comparative study of MFCC-KNN and LPC-KNN for Conference on Information and Communications
Hijaiyyah letters pronunciation classification system," in Technology (ICOIACT). IEEE, 2018, pp. 935-940.
2017 5th International Conference on Information and [24] M. Irfan, I. Z. Mutaqin, and R. G. Utomo,
Communication Technology (ICoICT). IEEE, 2017, pp. 1- "Implementation of dynamic time warping algorithm on
5. an Android-based application to write and pronounce
[9] R. U. Khan, A. M. Qamar, and M. Hadwan, "Quranic hijaiyah letters," in 2016 4th International Conference on
reciter recognition: a machine learning approach," Cyber and IT Service Management. IEEE, 2016, pp. 1-6.
Advances in Science, Technology and Engineering [25] N. W. Arshad, M. Z. Ibrahim, R. A. Karim, Y. A. Wahab,
Systems Journal, vol. 4, no. 6, pp. 173-176, 2019. N. F. Zakaria, and T. T. Muda, "Signal-based feature
[10] S. M. Shah and S. N. Ahsan, "Arabic speaker extraction for makhraj emission point classification," in
identification system using a combination of DWT and Engineering Technology International Conference (ETIC
LPC features," in 2014 International Conference on 2022), vol. 2022. IET, 2022, pp. 19-25.
Open Source Systems & Technologies. IEEE, 2014, pp. [26] J. Farooq and M. Imran, "Mispronunciation detection in
176-181. articulation points of Arabic letters using machine
[11] A. Elnagar, R. Ismail, B. Alattas, and A. Alfalasi, learning," in 2021 International Conference on
"Automatic classification of reciters of Quranic audio Computing, Electronic and Electrical Engineering (ICE
clips," in 2018 IEEE/ACS 15th International Conference Cube). IEEE, 2021, pp. 1-6.
on Computer Systems and Applications (AICCSA). [27] B. Yousfi and A. M. Zeki, "Holy Qur'an speech
IEEE, 2018, pp. 1-6. recognition system Imaalah checking rule for Warsh
[12] A. Qayyum, S. Latif, and J. Qadir, "Quran reciter recitation," in 2017 IEEE 13th International Colloquium
identification: A deep learning approach," in 2018 7th on Signal Processing & Its Applications (CSPA). IEEE,
International Conference on Computer and 2017, pp. 258-263.
Communication Engineering (ICCCE). IEEE, 2018, pp. [28] B. Yousfi, A. M. Zeki, and A. Haji, "Isolated Iqlab
492-497. checking rules based on speech recognition system," in
[13] T. S. Gunawan, N. A. M. Saleh, and M. Kartiwi, 2017 8th International Conference on Information
"Development of Quranic reciter identification system Technology (ICIT). IEEE, 2017, pp. 619-624.
using MFCC and GMM classifier," International Journal [29] F. Thirafi and D. P. Lestari, "Hybrid HMM-BLSTM-based
of Electrical and Computer Engineering (IJ ECE), vol. 8, acoustic modeling for automatic speech recognition on
no. 1, pp. 372-378, 2018. Quran recitation," in 2018 International Conference on
[14] A. M. Basahel, A. A. Abi Sen, M. Yamin, N. M. Bahbouh, Asian Language Processing (IALP). IEEE, 2018, pp.
and S. Basahel, "A smart flexible tool to improve reading 203-208.
skill based on m-learning," in 2022 9th International [30] M. Hadwan, H. A. Alsayadi, and S. AL-Hagree, "An end-
Conference on Computing for Sustainable Global to-end transformer-based automatic speech recognition
Development (INDIACom). IEEE, 2022, pp. 411-414. for Quran reciters." Computers, Materials & Continua,
[15] I. Ahsiah, N. Noor, and M. Idris, "Tajweed checking vol. 74, no. 2, 2023.
system to support recitation," in 2013 International [31] B. Yousfi and A. M. Zeki, "Holy Qur'an speech
Conference on Advanced Computer Science and recognition system distinguishing the type of recitation,"
Information Systems (ICACSIS). IEEE, 2013, pp. 189- in 2016 7th International Conference on Computer
193. Science and Information Technology (CSIT). IEEE,
[16] E. Yosrita and A. Haris, "Identify the accuracy of the 2016, pp. 1-6.
recitation of al-Quran reading verses with the science of [32] A. Nur, S. Syarifandi, S. Amin et al., "Implementation of
tajwid with mel-frequency cepstral coefficients method," text mining classification as a model in the conclusion of
in 2017 International Symposium on Electronics and tafsir bil ma'tsur and bil ra'yi contents," Int. J. Eng. Adv.
Smart Devices (ISESD). IEEE, 2017, pp. 179-183. Technol, vol. 9, no. 1, pp. 2789-2795, 2019.
[17] T. Altalmas, W. Sediono, N. N. W. N. Hashim, S. Ahmad, [33] S. Shahriar and U. Tariq, "Classifying maqams of
and S. Khairuddin, "Analysis of two adjacent articulation Quranic recitations using deep learning," IEEE Access,
Quranic letters based on MFCC and DTW," in 2018 7th vol. 9, pp. 117271-117281, 2021.
International Conference on Computer and [34] A. Abd Almisreb, A. F. Abidin, and N. M. Tahir, "Noise
Communication Engineering (ICCCE). IEEE, 2018, pp. effects on the recognition rate of Arabic phonemes
187-191. based on Malay speakers," in 2014 IEEE Symposium on
72
Industrial Electronics & Applications (ISIEA). IEEE, 2014, Muhammad Aleem Shakeel is an electrical engineer with a
pp. 1-6. strong background in artificial intelligence and autonomous
[35] S. Khairuddin, S. Ahmad, A. H. Embong, N. N. W. N. systems. He completed his Bachelor's degree in Electrical
Hashim, and S. S. Hassan, "Features identification and Engineering from the University of Engineering and
classification of alphabet (ro) in leaning (al-inhiraf) and Technology (UET), Taxila. Eager to learn more about AI, he
repetition (al-takrir) characteristics," in 2019 IEEE pursued a Master's degree in Electrical Engineering, at the
International Conference on Automatic Control and National University of Sciences and Technology (NUST) in
Intelligent Systems (2CACIS). IEEE, 2019, pp. 295-299. Islamabad. He is with Invoice Mate, a company that offers AI-
[36] M. I. E.-K. Ghembaza, O. Tayan, and K. S. Aloufi, based Invoicing solutions, designing and implementing AI
"Qurani Rafiqui: an interactive context-aware Quranic technologies that improve the efficiency and accuracy of
application for smartphones," in 2018 1st International invoicing processes. His research interests include Machine
Conference on Computer Applications & Information Learning, Deep Learning, Generative AI and NLP for Speech
Security (ICCAIS). IEEE, 2018, pp. 1-6. Signals and documents.Email: muhammad.aleem227@
[37] F. Sadat, F. Kazemi, and A. Farzindar, "Automatic gmail.com
identification of Arabic dialects in social media," in
Proceedings of the first international workshop on Social Hasan Ali Khattak (Senior Member IEEE) received his
media retrieval and analysis, 2014, pp. 35-40. Ph.D. in Electrical and Computer Engineering degree from
[38] K. Nahar, R. Al-Khatib, M. Al-Shannaq, and M. M. Politecnico di Bari, Bari, Italy, in April 2015, a master’s degree
Barhoush, "An efficient Holy Quran recitation recognizer in information engineering from Politecnico di Torino, Torino,
based on SVM learning model," Jordanian Journal of Italy, in 2011, and a B.CS. Degree in Computer Science from
Computers and Information Technology (JJCIT), vol. 6, the University of Peshawar, Peshawar, Pakistan in 2006. He
no. 04, pp. 394-414, 2020. has been an associate professor at the School of Electrical
[39] D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. Engineering and Computer Science, National University of
D. Cubuk, and Q. V. Le, "SpecAugment: A simple data Sciences and Technology, Pakistan, since October 2020. His
augmentation method for automatic speech recognition," current research interests focus on Future Internet
arXiv preprint arXiv:1904.08779, 2019. Architectures such as the Web of Things and leveraging Data
[40] D. Omran, S. Fawzi, and A. Kandil, "Automatic Detection Sciences and Social Engineering for Future Smart Cities.
of Some Tajweed Rules," in 2023 20th Learning and Email: hasan.alikhattak@seecs.edu.pk
Technology Conference (L&T), January 2023, pp. 157-
160.
[41] R. M. Siregar, B. Satria, A. Prayogi, M. A. S. Pane, E. E. Numan Khurshid, obtained his Ph.D. in Electrical
Awal, and Y. R. Sari, "Identification of Tajweed Engineering specialized in Artificial Intelligence from Lahore
Recognition using Wavelet Packet Adaptive Network University of Management Sciences (LUMS), Lahore.
based on Fuzzy Inference Systems (WPANFIS)," Currently, he is working as an Assistant Professor, at the
Internet of Things and Artificial Intelligence Journal, vol. Department of Electrical Engineering, Schools of Electrical
4, no. 1, pp. 32-41, 2024. Engineering and Computer Sciences (SEECS), National
[42] A. F. Ghori, A. Waheed, M. Waqas, A. Mehmood, and S. University of Sciences and Technology (NUST), Islamabad,
A. Ali, "Acoustic modelling using deep learning for Quran Pakistan. His research interests include Machine Learning,
recitation assistance," International Journal of Speech Deep Learning, and Generative AI for Remote Sensing
Technology, vol. 26, no. 1, pp. 113-121, 2023. Images and Speech Signals Email: numan.khurshid
[43] H. M. Mahmudin and H. Akbar, "Qur’an Recitation @seecs.edu.pk
Correction System Using Deepspeech," Indonesian
Journal of Multidisciplinary Science, vol. 2, no. 11, pp.
4010-4022, 2023.
[44] A. Al Harere and K. Al Jallad, "Quran Recitation
Recognition using End-to-End Deep Learning," arXiv
preprint arXiv:2305.07034, 2023.
[45] A. A. Harere and K. A. Jallad, "Mispronunciation
Detection of Basic Quranic Recitation Rules using Deep
Learning," arXiv preprint arXiv:2305.06429, 2023.
73
View publication stats