Full Document 146
Full Document 146
Submitted by
ADELINE M
215229146
Guided by
TIRUCHIRAPPALLI 620017
APRIL 2023
DECLARATION
I hereby declare that the project work presented is originally done by me under the guidance of
Ms.G. Rajalaxmi, M.Sc., Assistant Professor, Department of Data Science, Bishop Heber
College (Autonomous), Tiruchirappalli 620017, and has not been included in any other
Batch : 2021-2023
TIRUCHIRAPPALLI 620017
Date:
BONAFIDE CERTIFICATE
This is to certify that the project work titled “BUILDING A TRANSLITERATION USING
LSTM &GRU” is a bonafide record of the project work done by Adeline.M, 215229146, in
partial fulfillment of the requirements for the award of the degree of MASTER OF SCIENCE
The Viva-Voce examination for the candidate Adeline.M, 215229146, was held on
Examiners:
1.
2.
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude and deep thanks to Dr. K. RAJKUMAR
M.Sc., M.Phil., Ph.D., Associate Professor and Head of Department of Data Science, who
has been a source of encouragement and moral strength throughout our study period.
This project would not have been possible without the motivation and guidance of my
Internal guide and Class In-Charge Ms.G. RAJALAXMI, M.Sc., Assistant Professor,
Department of Data Science, they are the backbone of my project and their encouragementand
moral support to finish this project.
I am extremely thankful to my family members and my dear friends who helped a lot in
the completion of the project.
ADELINE M
ABSTRACT
Transliteration is a process of mapping between one writing system and another based
on sound similarity. In a machine translation system, transliteration is mostly employed to
handle named entities and words that are not in the lexicon. For instance, in Hindi transliteration,
can type"thanyavaath" to get "थन्यवाथ", that sounds like "thanyavaath”. Transliteration is simply
intendedto change the letters or characters of a source language into corresponding letters of the
target language. It is useful if a user knows a language but cannot write their script. Hindi is
India's 'lingua franca'. Transliteration allows people to speak words and names in different
languages.
Transliteration is designed to change only the letters or characters of a source language
into the corresponding letters of the target language. It makes no sense as opposed to translation,
which converts the written or spoken meaning of words or text from a source language into a
target language. An English word stands for the same word in Devanagari (Hindi) writing using
two RNNs. i.e., an Encoder and a Decoder model. Transliteration can be used with efficacy in
the case of names. The accuracy of the model can be further increased with Sequence to
Sequence used in combination with Encoder-Decoder Model. Transliteration from English
language to Hindi language plays a very important role as Hindi is official language of India.
There is much data is present in Hindi that has to convert to English for worldwide use.
vi
TABLE OF CONTENTS
Chapter Title Page No
Abstract vi
List of Figures vii
List of Tables viii
1 Introduction 01
1.1 Motivation
1.2 Existing Systems and Solutions
1.3 Product Needs and Proposed System
1.4 Product Development Timeline
2 Literature Review 04
2.1 Machine Learning Approach on Transliteration System
2.2 Deep Learning Approach on Transliteration system
2.3 Machine Translation Approach on Transliteration
2.4 NLP Approach on Transliteration System
3 Data Collection 08
3.1 Description of the Data
3.2 Source and Methods of Collecting Data
4 Preprocessing and Feature Selection 11
4.1 Overview of Preprocessing Methods
4.2 Overview of Feature Selection Methods
4.3 Preprocessing and Feature Selection Steps
5 Model Development 16
5.1 Model Architecture
5.2 Algorithms Applied
5.3 Training Overview
6 Experimental Design and Evaluation 22
6.1 Experimental Design
6.2 Experimental Evaluation
6.3 Customer Evaluation and Feedback
7 Model Optimization 25
7.1 Overview of Model Tuning and Best Parameter Selection
7.2 Model Tuning Process and Experiments
8 User Interface Design and Evaluation 30
8.1 Designing Graphical User Interface
8.2 Testing Graphical User Interface
9 Product delivery and deployment 32
10 Conclusion 33
10.1 Summary
10.2 Limitation and Future Work
References 35
Appendix-A: Data Set
Appendix-B: Source Code
Appendix-C: Output Screenshots
vii
LIST OF FIGURES
Figure No. Description Page
viii
LIST OF TABLES
Table No. Description Page
ix
Chapter 1
INTRODUCTION
1.1 Motivation
Transliteration may also be useful in situations where there are more than one spelling
system for the same language. Conversion between simple and traditional systems or between
Cyrillic and Latin alphabets. As well as preserving meaning and pronunciation. Transliteration can
also make it easier to communicate and understand among people who speak different languages
and use differentwriting systems. By using the system, it converts from known language to unknown
language it helpsto read another language and more interested in pronouncing it, than understanding
it. Development of an automatic system for performing English to Hindi. Transliteration is a
method of converting a written word into a language using the alphabet of the second language.
Many transliteration systems exist which have been trained to see the target words.
Transliteration is important to communicate and understand among various languages and writing
systems. Many automatic transliteration algorithms have been created in recent years using specific
statistical and linguistic techniques, as well as phonetic source and target languages. It exists as
International Phonetic Alphabet (IPA). It is used for transcribing speech sounds into any language,
and can also be used for transliterating between writing systems. ASCII transliteration is used as a
combination of Latin letters and punctuation marks to represent the sounds of the original
handwritingsystem.
1
Google Transliteration offers a transliteration service which allows users to type in one language
and automatically convert the text into a writing system of another language. This system issuitable
for a wide variety of languages and writing systems. These tools enable users to enter text into a
writing system and have it converted to another real-time writing system. The most common online
transliteration tools are (Transliteration.com). Machine learning based solutions are recent
advancements in machine learning and natural language processing, there are many machines
learningbased solutions for transliteration. Common solutions based on machine learning include
the transliteration transformer and Seq2Seq.
Input layer for character embeddings, one encoder RNN with the input character sequence
(English), One decoder RNN with last state of the encoder as input and produces output character.
(Devanagari), With Backpropagation algorithm calculates the gradient of the error function, A
group of method known as backpropagation algorithms are used to effectively train artificial neural
networks using a gradient descent method that takes advantage of the chain rule. The Gradient
Descent algorithm would be used effectively.
2
1.4 Product Development Timeline
3
Chapter 2
Literature Review
P.J. et al. suggested a Support Vector Machine-based English to Kannada transliteration system.
The suggested system employs a two-step method for transliteration called sequence labelling. The
source string is segmented into transliteration units in the first stage, and source and target
transliteration units are compared in the second. Moreover, it resolves many alignment and unit
mapping combinations. The entire procedure is broken down into three stages: Preprocessing, SVM
training, and transliteration. The training file is transformed into the SVM-compatible format
during the preparation stage. For SVM training, the authors are employing a database of 40,000
location namesin India.[8]
Abbas Malik, Laurent Besacier Christian Boitet and Push-pak Bhattacharyya proposed an
Urduto Hindi Transliteration using hybrid approach in 2009. This hybrid approach combines finite
state machine-based techniques with statistical word language model and this achieved better
performance.The main aim of this system was the removal of diacritical words of the Urdu input
text. This systemimproved the accuracy by 28.3 % compared to their previous finite transliteration
model.[11]
Arbabi et al. used neural networks and knowledge-based systems to create an Arabic- English
transliteration system In this approach, the initial step was to enter the names that were taken from
4
the telephone dictionary and entered into the database. A knowledge- based technique is utilised to
vowelize these names in order to add the short vowels that are lacking, similar to how short vowels
are typically not printed in Arabic script. The words that KBS is unable to appropriately vowelize
are subsequently removed using an artificial neural network. Cascade correlation method, a
supervised, feed forward neural processing methodology, is used to train the network. Hence, neural
networks areused to assess the names' accuracy in terms of Arabic syllabification. The network
produces binary data as its output.[3]
Sanjana Shree and Anand Kumar. A deep learning-based system for multilingual machine
transliteration of Tamil and English was presented by Sanjana Shree and Anand Kumar. Deep belief
Network (DBN), a generative graphical model, is used by the system. Sparse binary matrices are
created from the data in both languages. Every word has character padding added at the end to keep
the word length consistent when it is encoded as a sparse binary matrix. A generative graphical
model called Deep Belief Network is composed of multiple layers of Restricted Boltzmann
Machine, a type of Boltzmann Machine and Random Markov Field.[7]
Andy Way, Sudip Kumar Naskar, Sandipan Dandapat, Ankit Kumar Srivastava, and Rejwanul
Haque In 2009, CNGL put forth a Context-Informed Phrase-based statistical machine translated for
such an English to Hindi transcription. Depending on specific syntactic, lexicon, and semantic
principles, the RBMT system transforms the source text into an interpreted language. The
destinationlanguage are subsequently translated from the alternative model. Instead of interpreting
phrases as in character-level transforming algorithms, the suggested transliteration system was
modelled by translating letters. They employed a memory-based classification framework that
makes it possible toestimate these attributes well while eliminating issues with noisy data.[13]
Wan and Verspoor approached system for "Automatic English Chinese Name Transliteration"
Usingpronunciation, the system transliterated the words. In other words, the spoken form of the
word was used to map the written English word to the written Chinese character. Each phoneme in
an English word was mapped to a corresponding Chinese character in order for the system to
function. Five stepsmade up the transliteration process: semantic abstraction, syllabification, sub-
syllable divisions, mapping to pinyin, and mapping to Han characters. To determine which parts of
the word should be translated or transliterated, the Preprocessing step known as semantic
abstraction looked up the wordin dictionaries.[2]
5
Deep and Goyal created Rule-based Punjabi to English transliteration method for common names.
Character sequence mapping rules are used to translate across the languages in the proposed system.
The rules are created with certain limits to increase accuracy. This system was evaluated using
various person names, city names, river names, etc. after being taught using the names of 1013
subjects. The system recorded a 93.22% total accuracy rate.[5]
Lehal and Saini created a Perso-Arabic to Indic Script Machine Transliteration Model. The hybrid
transliteration system that blends rules with word- and character-level language models. The system
has undergone successful testing in Sindhi, Urdu, and Punjabi and is easily expandable to include
newlanguages like Kashmiri and Konkani. The three scripts' transliteration accuracy ranges from
91.68%to 97.75%, making it the best accuracy for script pairs in Perso-Arabic and Indic script ever
documented in the literature.[10]
Gurpreet Singh Josan and Jagroop Kaur created a Punjabi to Hindi transliteration algorithm
using a statistical approach. This approach attempted to determine enhancements using statistical
techniquesusing a paragraph transfer as just a baseline. Instead of decoding words as in character-
level translation systems, PB-SMT algorithms are used for transliteration. The matching algorithm
was already grown diag-final with in transliteration model training step, while other parameters
have default values. They developed the system using free SMT technologies and a Punjabi-Hindi
corpus for training.[12]
Taraka Rama and Karthik Gali in 2009. The transliteration issue was handled as a translation
issued For this endeavour, researchers utilized phrase-based SMT algorithms. This method
developed a transliteration system using a beam search-based decoder with GIZA++, both of which
are available freely. One well English-Hindi matched corpus is used to teach the algorithm, and so
this prototype reports a testing efficiency of 46%.[15]
Dhore et al. suggested the transliterate names from Hindi to English utilising conditional random
fields. The programme receives Devanagari-written Hindi place names as input and translates them
into English. The data is given in the form of expression for the n-gram techniques. The objective
isto create an English transliteration of a Hindi name using the statistical probability method of
CRF and the feature set of n-grams. The suggested strategy was tested using a bilingual corpus
6
named items gathered from books and online sources. The system's 85.79% bi- gram accuracy for
Hindi as the source language is excellent.[4]
Harshit Surana and Anil Kumar Singh developed a transliteration scheme for Telugu and Hindi,
two Indian languages. They used character-based n-grams to determine if a word was Indian or
foreign before classifying it as such. The likelihood of the word's origin was calculated using
symmetric cross entropy. Several methods of transliteration were carried out for various classes
based on this probability value (Indian or foreign). For the transliteration of an Indian word, the
method first divided the word into segments based on potential vowel and consonant combinations,
and then used various principles to map these segments to their closest letter combinations. The
above said steps produce transliteration candidates, which are then ranked and filtered using fuzzy
string matching. The target word is then produced by matching the transliteration candidates to
words in the corpus of the target language.[6]
Mathur and Saxena have created hybrid method, Mathur and Saxena have created a system for
English-Hindi named entity transliteration.The algorithm first analyses English words and applies
rules to phoneme extraction. The English phoneme is then translated into its comparable Hindi
phoneme using a statistical approach.The authors recovered 42,371 name entities using Stanford's
NER for name entity extraction. These things were subjected to rules, and phonemes were
extracted.Adatabase of English-Hindi phonemes was created after these English phonemes were
transliterated into Hindi.[9]
Amitava Das, Asif Ekbal, Tapabrata Mandal, and Sivaji Bandyopadhyay tackled the
transliteration issue in 2009. In the suggested approach, the letter-to-phoneme subtask of text-to-
speech analysis was used to understand the transcription issue. Despite making any additional
adjustments, researchers used a risks and returns involved of cutting-edge, discriminative letter-to-
phoneme to solve the issue. In this experiment, they showed that such an automated letter-to-
phonemetranslator works effectively without any adjustments for translation or transliteration.[14]
7
Chapter 3
DATA COLLECTION
The Dakshina dataset includes word pairs in both English and Indian languages including
Bengali, Tamil, and Hindi, as well as its transliterated equivalents. It consists of 2lakh words. The
model is separated as 80-20 as 80% training and 20% of testing. To build and analyze machine
learningmodels for transliteration operations, researchers developed the dataset. It is among the
larger public access datasets for transliteration than 1.5-million- word pairings. Training,
development, and testingportions make up the dataset.
8
Training Set
Devanagari script is used to write several South Asian languages, including Hindi, Nepali,
Marathi, and Sanskrit. It contains more than 1lakh words. The training set for Devanagari script is
a collection of text samples used to train a machine learning model to recognize and classify
Devanagaricharacters. In the case of text-based recognition systems, the training set may consist of
thousands of lines of text written in Devanagari script. Each line of text is labelled with the
corresponding transcription or translation in another language, which serves as the ground truth for
the machine learning algorithm.
The size and composition of the training set can greatly impact the accuracy and
performance of the machine learning model. A larger and more diverse training set can lead to
betterresults, but also requires more computational resources and time to train. Therefore, creating
an effective training set requires careful consideration of the intended application and available
resources.
9
Testing Set
Dakshina dataset is a large-scale multi-script multi-domain dataset for Indian language
understanding research. The testing set in Dakshina dataset is a subset of the dataset that is used to
evaluate the performance and accuracy of machine learning models trained on the training set. It
carefully selected to represent a diverse range of domains and genres, such as news, social media,
andliterature, and covers multiple scripts and languages, including Devanagari, Tamil, and Telugu.
The dataset is annotated with tasks such as named entity recognition, part-of-speech tagging, and
sentimentanalysis.
For example, for the named entity recognition task, the testing set consists of text samples
that have been manually annotated with the named entities present in the text, such as person names,
organization names, and location name.
https://github.com/google-research-datasets/dakshina
10
Chapter 4
Text Normalization: This involves converting the input text into a standardized format. In
Devanagari, this may involve converting different character variations, such as Matras, to their
standard form.
Tokenization: This involves breaking the input text into individual words or tokens. In Devanagari,
this may involve segmenting words based on spaces or other delimiters.
input_data_characters = set()
target_data_characters = set()
for line in train_data_lines[: lenk]:
target_data, input_data, _ = line.split("\t")
for ch in input_data:
if ch not in input_data_characters:
input_data_characters.add(ch)
for ch in target_data:
if ch not in target_data_characters:
target_data_characters.add(ch)
Stop word Removal: This involves removing common words that are unlikely to contribute to the
transliteration process. In Devanagari, this may involve removing common function words such as
"हैं" (are) or "वाला" (of).
11
# Remove all Hindi non-letters
def cleanHindiVocab(line):
line = line.replace('-', ' ').replace(',', ' ')
cleaned_line = ''
for char in line:
if char in hindi_alpha2index or char == ' ':
cleaned_line += char
return cleaned_line.split()
Stemming: This involves reducing words to their root form, to improve the efficiency and accuracy
of the machine learning model. In Devanagari, this may involve removing suffixes such as "ता"
(ness)or "वाली" (of).
Feature Engineering: This involves selecting and engineering features that are most relevant to the
transliteration task. In Devanagari, this may involve using features such as phonetic similarity,
syllablestructure or character n-grams.
Overall, Preprocessing methods can greatly improve the performance and accuracy of
machine learning systems for Devanagari transliteration. The choice of Preprocessing methods
depends on thespecific task and dataset at hand.
12
4.2 Overview of Feature Selection Methods
In the context of developing deep learning models based on LSTM and GRU for Devanagari
transliteration, feature selection plays an important role in improving the accuracy and efficiency
of the models. The goal of feature selection is to identify the most relevant features that can
accurately predict the transliteration output. Some common feature selection methods used in
LSTM and GRU models for Devanagari transliteration include.
Embedding Layer
This involves using an embedding layer to learn a dense representation of the inputtext. The
embedding layer maps each character or character sequence to a low-dimensional vector, which is
learned during training. This allows the model to capture the semantic meaning of the inputtext,
which can improve the accuracy of the transliteration output.
encoder_input_data,decoder_input_data,decoder_target_data,num_encoder_tokens,num
_decoder_tokens,input_token_idx,target_token_idx,encoder_max_length,decoder_max_le
ngth = embed_train_data(train_data_lines)
val_encoder_input_data,val_decoder_input_data,val_decoder_target_data,target_token_i
dx,val_target_data =
embed_val_data(val_data_lines,num_decoder_tokens,input_token_idx,target_token_idx)
Attention Mechanism
This involves using an attention mechanism to selectively attend to certain parts of the input
text. The attention mechanism weights the importance of each input feature based on its relevance
to the transliteration task. This can improve the efficiency of the model by reducing the number of
irrelevant features.
Dropout Regularization
This involves randomly dropping out some input features during training to prevent
overfitting. This can improve the generalization of the model and prevent it from memorizingthe
training data.
13
4.3 Preprocessing and Feature Selection Steps
In the case of building a transliteration system using a Devanagari dataset, there are two
criticalsteps: Preprocessing and Feature Selection. Preprocessing: Preprocessing is the initial step in
buildinga transliteration system. The goal of Preprocessing is to transform the dataset into a format
that can be easily processed by the Deep learning algorithms.
Tokenization: Split the text into individual words or characters to create a sequence. Normalization:
Convert the text to a standard format, such as converting uppercase to lowercase and removing
diacritics.
Feature Selection
The second step in building a transliteration system is feature selection. Feature selection is
the process of selecting the most relevant features from the pre-processed data. The goal of feature
selection is to reduce the number of features to improve the accuracy of the model and reduce the
timeand computational resources required to train the model.
14
Feature selection includes the following steps
Feature Extraction: Extract relevant features from the preprocessed data. In the case of transliteration,
features may include the presence of specific characters, the frequency of character combinations, and
the position of characters within the word.
Character Mapping: Character mapping is the process of mapping input characters to output
characters. This is important in transliteration systems because the input and output characters may
not be the same.
{'ऀ': 0, 'ऀ': 1, 'ऀ': 2, 'ऀ ': 3, 'ऄ': 4, 'अ': 5, 'आ': 6, 'इ': 7, 'ई': 8, 'उ': 9, 'ऊ': 10, 'ऋ': 11
, 'ऌ': 12, 'ऍ': 13, 'ऎ': 14, 'ए': 15, 'ऐ': 16, 'ऑ': 17, 'ऒ': 18, 'ओ': 19, 'औ': 20, 'क': 21, 'ख'
: 22, 'ग': 23, 'घ': 24, 'ङ': 25, 'च': 26, 'छ': 27, 'ज': 28, 'झ': 29, 'ञ': 30, 'ट': 31, 'ठ': 32, '
ड': 33, 'ढ': 34, 'ण': 35, 'त': 36, 'थ': 37, 'द': 38, 'ध': 39, 'न': 40, 'ऩ': 41, 'प': 42, 'फ': 43,
'ब': 44, 'भ': 45, 'म': 46, 'य': 47, 'र': 48, 'ऱ': 49, 'ल': 50, 'ळ': 51, 'ऴ': 52, 'व': 53, 'श': 54,
'ष': 55, 'स': 56, 'ह': 57, 'ऀ': 58, 'ऀ ': 59, 'ऀ': 60, 'ऽ': 61, 'ऀ ': 62, 'िऀ': 63, 'ऀ ': 64, 'ऀ': 65,
'ऀ': 66, 'ऀ': 67, 'ऀ': 68, 'ऀ': 69, 'ऀ': 70, 'ऀ': 71, 'ऀ': 72, 'ऀ ': 73, 'ऀ ': 74, 'ऀ ': 75, 'ऀ ': 76,
'ऀ': 77, 'ॎऀ': 78, 'ऀ ': 79, 'ॐ': 80, 'ऀ': 81, 'ऀ': 82, 'ऀ': 83, 'ऀ': 84, 'ऀ': 85, 'ऀ': 86, 'ऀ': 87,
'क़': 88, 'ख़': 89, 'ग़': 90, 'ज़': 91, 'ड़': 92, 'ढ़': 93, 'फ़': 94, 'य़': 95, 'ॠ': 96, 'ॡ': 97, 'ऀ': 98
, 'ऀ': 99, '।': 100, '॥': 101, '०': 102, '१': 103, '२': 104, '३': 105, '४': 106, '५': 107, '६': 108,
'७': 109, '८': 110, '९': 111, '॰': 112, 'ॱ': 113, 'ॲ': 114, 'ॳ': 115, 'ॴ': 116, 'ॵ': 117, 'ॶ': 118
, 'ॷ': 119, 'ॸ': 120, 'ॹ': 121, 'ॺ': 122, 'ॻ': 123, 'ॼ': 124, 'ॽ': 125, 'ॾ': 126, 'ॿ': 127}
eng_alphabets = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
pad_char = '-PAD-'
eng_alpha2index_r = {}
for index, alpha in enumerate(eng_alphabets):
eng_alpha2index_r[alpha] = index
print(eng_alpha2index_r)
{'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4, 'F': 5, 'G': 6, 'H': 7, 'I': 8, 'J': 9, 'K': 10, 'L': 11, 'M': 12, 'N': 13, 'O':
14, 'P': 15, 'Q': 16, 'R': 17, 'S': 18, 'T': 19, 'U': 20, 'V': 21, 'W': 22, 'X': 23, 'Y': 24, 'Z': 25}
15
Chapter 5
MODEL DEVELOPMENT
Figure:5.1 Model
Architecture
16
point and provides details about the input sequence.
The decoder then creates the output sequence one token at a time using the encoder's
concealedstate. The decoder generates the following token at each time step by using the previous
token in theoutput sequence and the concealed state of the decoder at that time. Based on the prior
hidden state and the current input token, the decoder's hidden state is changed at each time step.
The model is tuned during training to reduce the discrepancy between the intended output
sequence and the predicted output sequence. The specific transliteration task determines the loss
function to be utilized. The encoder-decoder architecture with LSTM or GRU units is a potent
methodfor transliteration that can manage complex sequences and yield precise results.
Decoder = Embedding layer (target language) + RNN layer + Dense layer (to predict
next character)
Vanilla model
In contrast to traditional feed-forward neural networks, recurrent neural networks are a sort
ofnetwork architecture that accepts variable inputs and variable outputs.
Sequence to Sequence
17
5.2 Algorithms Applied
The LSTM model consists of multiple LSTM cells, which are capable of remembering and
forgetting information over long periods. Each LSTM cell has three gates: input gate, output gate,
and forget gate, which control the flow of information. The process of transliterating involves
changing atext's writing system. Using an encoder-decoder architecture with LSTM or GRU units
is one typicalmethod of transliteration in machine learning.
Figure:5.2 LSTM
Architecture
18
Figure:5.3 GRU Architecture
The optimizer algorithm could be Adam or SGD, the optimizer algorithm determines how
themodel's weights are updated during training to minimize the loss function. In RMSprop (Root
Mean Square Propagation) is an adaptive learning rate optimization algorithm used for training
artificial neural networks. It is designed to improve upon the problems with the Adagrad optimizer,
which adapts the learning rate to each individual parameter but accumulates gradients squared over
the entire training process. RMSprop addresses the problems with Adagrad by introducing an
exponentially weighted moving average of the squared gradients.
The algorithm calculates the running average of the squared gradients over time and divides
the current gradient by the root mean square (RMS) of these squared gradients. The weighted
movingaverage of the squared gradients, epsilon is a small constant to avoid division by zero, and
learning rate is the learning rate hyperparameter. The RMSprop optimizer is effective at dealing
with the problem of vanishing or exploding gradients, which can occur when training deep neural
networks. It
19
has become a popular choice for optimizing deep neural networks in a variety of applications,
including computer vision, natural language processing, and speech recognition.
The loss function could be categorical cross-entropy and measures the difference between
themodel's predictions and the true values and the goal is to minimize the loss function during
training. Categorical cross-entropy is a common loss function used for training neural networks in
this task. The categorical cross-entropy loss function measures the difference between the predicted
probabilitydistribution and the true probability distribution of the output. In a transliteration system,
the output can be represented as a sequence of characters or phonemes in the target writing system.
Let's assume that we have a dataset of pairs of words in the source writing system and their
corresponding transliterated versions in the target writing system. The goal is to train a neural
networkto predict the target transliteration given a source word. For each word in the dataset, the
network produces a probability distribution over the possible target transliterations. The categorical
cross- entropy loss function is then calculated by comparing this predicted probability distribution
with the true probability distribution. The true probability distribution can be represented as a one-
hot vector,where the element corresponding to the correct transliteration is set to 1, and all other
elements are setto 0.
In other words, the loss function penalizes the network for assigning low probabilities to
the correct transliteration and high probabilities to incorrect transliterations. During training, the
networkadjusts its parameters to minimize the average categorical cross-entropy loss over the entire
dataset. This encourages the network to make more accurate predictions for new words in the future.
model.compile(optimizer="rmsprop",
loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(
[encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=epochs)
20
The evaluation metrics are used to measure the model's performance during and after
training,and they help to assess how well the model can generalize to new data. Once the model
has been compiled, it is ready to be trained on the dataset, and the training process involves iteratively
adjustingthe weights to minimize the loss function. Once training is complete, the model can be
used to transliterate text from one script to another.
21
Chapter 6
Step1: Input Layer: The input layer consists of a sequence of characters in the source
language(Devanagari script).
Step2: Embedding Layer: The embedding layer converts the input sequence into a dense
vectorrepresentation.
Step3: LSTM Layer: The LSTM layer processes the input sequence and generates a sequence
ofhidden states, which capture the context and meaning of the input.
GRU Layer: The GRU layer processes the input sequence and generates a sequence of
hiddenstates, which capture the context and meaning of the input.
Step4: Dense Layer: The dense layer maps the hidden states to the target language (Latin
script)characters.
Step5: Output Layer: The output layer generates the predicted sequence of characters in the
targetlanguage.
Both LSTM and GRU models can be trained using the backpropagation algorithm with the
cross- entropy loss function. During training, the model learns to minimize the difference between
thepredicted sequence and the ground truth sequence in the target language. Once the model is
trained, itcan be used to transliterate new sequences of characters from the source language to the
target language.
22
6.2 Experimental Results
The performance of LSTM and GRU cell types was superior to Simple RNN. LSTM
outperformed GRU in terms of performance. Two-layer encoding and three-layer decoding produce
good results. A model should learn the dataset more effectively with a higher number of layers.
Mostof the time, a learning rate of 0.001 results in good performance. Instead of giving a model 0
dropout,adding dropout enhances the model and speeds up computation. The performance of the
model is improved by increasing the layer-by-layer embedding size. The best Test accuracy was
obtained for these hyperparameters in the seq2seq model is 34.273% (for exact string match).
23
6.3 CUSTOMER EVALATION AND FEEDBACK
The proposed system is attempted to correctly identify and result the probability of the
garmentclassified. Since it reduces and labour work and functions reliably.
EVALUATION
The transliteration system should be able to accurately convert the text from one script to
another without any errors. Customers would expect the system to produce accurate and error-free
results. Speed should be able to convert the text quickly. Customers would expect the system to be
fast and efficient, especially if they have to process a large amount of text. Ease of use the
transliteration system and understand. Customers would expect the system to have a user-friendly
interface and be easy to navigate.
The Customization of the transliteration system should be customizable according to the
needs of the customer. Customers would expect the system to be flexible and adaptable. Customers
would expect the system to have a responsive and helpful support team that can assist with any
issues or questions that arise. Overall, the customer evaluation for a transliteration system would
depend on how well the system meets their specific needs and requirements.
FEEDBACK
24
Chapter 7
MODEL OPTIMIZATION
In a transliteration system using GRU, batch size refers to the number of samples that will
beused to train the model at once. When training a neural network, the training data is usually
divided into batches, which are processed one at a time. This is done for computational efficiency
and to allowfor the use of parallel processing. The batch size determines how many samples are
used to computethe gradient of the loss function, which is used to update the model's weights during
training. A largerbatch size will generally result in more stable updates and faster convergence, but
it may also requiremore memory and computation power. In the model.fit function, you can specify
the batch_size parameter to control the batch size used during training.
For example: model.fit (X_train, y_train, batch_size=128, epochs=10) This code will train
the modelusing batches of 128 samples at a time for 10 epochs. You can experiment with different
batch sizes to see how they affect the performance of your model.
train,enc,dec = build_model(units=256,dense_size=512,enc_layers=2,dec_layers=3,cell =
"GRU", embedding_dim = 64)
train.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
checkpoint =
tf.keras.callbacks.ModelCheckpoint('best_model.h5',monitor='val_accuracy',mode='m
ax ',save_ best_only=True,verbose=1)
train.fit([trainx,trainxx],trainy,
batch_size=128,
validation_data=([valx,valxx],valy),
epochs=5,
callbacks = [checkpoint])
25
In Validation Data the model training process, it is essential to monitor the performance of
the model on a separate dataset to ensure that it is not overfitting. This dataset is called the validation
dataset, and it is used to evaluate the model's performance after each epoch. The validation dataset
should be different from the training dataset and should not be used for training the model. The
purpose of the validation dataset is to evaluate the model's performance on unseen data and to
preventoverfitting.
In the model.fit function, you can specify the validation data using the validation_data
parameter.This code will train the model using the training data X_train and y_train with a batch
size of 32 for 10 epochs. After each epoch, the model's performance will be evaluated on the
validation dataX_val and y_val. An epoch is a single pass through the entire training dataset. During
each epoch, themodel is trained on all the samples in the training dataset.
The number of epochs determines how many times the model will iterate over the entire
training dataset. In the model.fit function, you can specify the number of epochs using the epochs
parameter. During each epoch, the model will iterate over all the samples in X_train and y_train.
After each epoch, the model's performance will be evaluated on the validation data X_val and y_val.
You can experiment with different values for the number of epochs to find the optimal number of
iterations required for the model to achieve good performance.
26
7.2 Model Tuning Process and Experiments
Early Stopping
Early stopping is a technique used during the training of deep learning models, including
transliteration systems, to prevent overfitting and improve generalization performance. In the early
stopping works by monitoring the performance of the model on a validation dataset during the training
process. The validation dataset is a set of examples that the model has not seen before and is used to
assess the generalization performance of the model. During training, the performance of the model on
the validation dataset is evaluated after each epoch (i.e., one pass through the training dataset). If the
performance on the validation dataset does not improve for a certain number of consecutive epochs,
the training is stopped early, and the model with the best performance on the validation dataset is
saved.
By stopping the training early, it prevents the model from continuing to learn patterns in the
training data that may not generalize well to new examples. Instead, we choose the model that
performs the best on the validation dataset, which is a better indicator of its generalization
performance. Overall, early stopping is an effective technique to improve the generalization
performance of transliteration systems and prevent overfitting to the training data.
# Early Stopping
# Config is a variable
train,enc,dec = build_model(units=units,
dense_size=dense_size, enc_layers=enc_layers, dec_layers=dec_layers, cell = cell,
dropout = dropout,
embedding_dim = embedding_dim) train.compile(optimizer =
Adam(learning_rate=learning_rate),loss='categorical_crossentropy',metrics=['accuracy'])
27
Model gives better performance by configuring with these parameters
Configuration Values
Batch-size 128
Cell GRU
Decoder-layers 2
Dense-size 512
Dropout 0.2
Embedding 64
Encoder-layers 1
Learning-rate 0.001
Units 256
The Hindi language corpus from the Dakshina dataset. These are the hyperparameters and
their values that searched over in order to find the best setting:
28
Figure:7.3 Model Accuracy
29
Chapter 8
Installation and Setup: The manual should provide instructions for installing and setting up
thesystem on a user's computer or device. Installing Flask and other libraries, packages are installed
withvirtual environment created and activated, the library Flask that need for transliteration system
are developed. We will install Flask by following a common convention to create the
requirements.txt file. and creating a app.py which is usually contain full python program of a
code(file) called app.py is an entry point for Flask applications. The manual explains how to input
text into the system and howthe output will be displayed. This may include information on supported
languages and character sets. This information on how specific characters or combinations of
characters are handled.
Functional Testing: Functional testing is important to ensure that the system is working
correctly and without any errors. This involves testing the system's functionality, such as checking
whether the system correctly handles special characters and punctuation marks, whether it can
handledifferent input formats, and whether it is capable of handling long texts.
Compatibility Testing is essential to ensure that the GUI is compatible with different
platformsand browsers, as well as with different screen sizes and resolutions. Compatibility testing
helps to ensure that users can access the transliteration system from different devices and platforms
without any issues. In Performance Testing the performance under different conditions, such as high
traffic orwhen handling large input files. Performance testing helps to identify any bottlenecks in
the system that may affect its performance, and optimize the system to improve its speed and
reliability.
Overall, testing a GUI for a transliteration system involves evaluating the system's input
and output accuracy, usability, functionality, compatibility, and performance. Proper testing
ensures that the system is efficient, reliable, and provides users with a seamless experience while
using the transliteration system.
31
Chapter 9
The Delivery Schedule for the transliteration system using Flask would typically involve
the following stages:
Development: The development stage involves creating the transliteration system using the Flask
framework. This may include setting up the Flask environment, creating the necessary modules and
components, and testing the system for functionality and performance.
Testing: Once the system is developed, it must be tested thoroughly to ensure that it meets the project
requirements and functions as intended. This may involve testing for errors, bugs, and other issues,
aswell as validating the accuracy of the transliteration system.
Deployment: Once the system has been tested and validated, it can be deployed to the target
environment. This may involve setting up the necessary infrastructure, configuring the system, and
deploying the application to the production environment.
Maintenance: After the system is deployed, it must be maintained to ensure that it continues to
functionproperly and meet the changing needs of the users. This may involve performing regular
updates andmaintenance tasks, as well as providing technical support to users who encounter issues
with the system. The delivery schedule for a transliteration system using Flask would typically
depend on thescope and complexity of the model. The delivery schedule is shown below table
32
Chapter 10
CONCLUSION
It concludes an encoder decoder model the encoder is responsible for reading a word in
source language and encoding it to an internal representation. The word transliteration in the
targeted word is produced by the decoder a model, utilizing the encoded representation of the source
language. Theentire encoded input is used as context for generating each step in the output.RNN
allows it to focus only on certain parts of the input sequence when predicting a certain part of the
output sequence, which increase the quality of learning and also enables easier learning.
The models have learned to map the English characters to their corresponding Hindi
characters and generate accurate transliterations. The use of LSTM and GRU models for the
transliteration system from English to Hindi is a promising approach that can help bridge the
language gap betweenthese two languages and enable effective communication.
10.1 Summary
The purpose of a transliteration system is to enable people to read and write text in a language
they may not be familiar with by providing a phonetic representation of the original text.
Transliteration systems can be used for a variety of applications, including language translation,
communication between people who speak different languages. A transliteration system using
LSTMand GRU are both types of recurrent neural networks that are designed to handle sequential
data suchas text.
The training process for a transliteration system using LSTM and GRU involves feeding the
network with large amounts of data in both the source and target writing systems. The network is
then able to learn the patterns and relationships between the two writing systems, and use this
knowledge to generate accurate transliterations for new input data. These networks are able to retain
informationover a long period of time, making them well-suited for tasks such as transliteration,
where the output is dependent on the context and structure of the input text and the output is
dependent on the contextand structure of the input text.
33
10.2 Limitations and Future Work
Transliteration systems are used for a variety of purposes, including facilitating
communication between speakers of different languages, helping to transcribe names and terms
inanother languages.
Ambiguity: Many languages have multiple ways of representing the same sound or word, which
can make it difficult to determine the correct transliteration.
Lack of standardization: There are often multiple competing transliteration systems for a given
language, which can lead to confusion and inconsistencies in the way that names and terms are
written.
Artificial intelligence and machine learning: Advanced algorithms could be trained to recognize
patterns in language and improve the accuracy of transliteration.
Standardization: Efforts to establish a standard transliteration system for each language could
helpto reduce confusion and ensure consistency across different applications.
Integration with other technologies: Transliteration systems could be integrated with other
technologies, such as speech recognition and natural language processing, to enable more
seamlesscommunication across languages.
34
REFERENCES
[1] B.-J. Kang and K.-S. Choi,"Automatic Transliteration and Back-transliteration by Decision
Tree Learning," in LREC, 2000.
[3] M. Arbabi, S. M. Fischthal, V. C. Cheng, and E. Bart, "Algorithms for Arabic name
transliteration," IBM Journal of research and Development, vol. 38, pp. 183- 194, 1994.
[6] H. Surana and A. K. Singh, "A More Discerning and Adaptable Multilingual Transliteration
Mechanism for Indian Languages," in IJCNLP, 2008, pp. 64-71.
[7] P. Sanjanaashree, "Joint layer based deep learning framework for bilingual machine
transliteration," in Advances in Computing, Communications and Informatics (ICACCI, 2014
International Conference on, 2014, pp. 1737-1743.
[8] P. Antony, V. Ajith, and K. Soman, "Kernel method for english to kannada transliteration,"in
Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International
Conference on, 2010, pp. 336-338.
[9] S. Mathur and V. P. Saxena, "Hybrid appraoch to EnglishHindi name entity transliteration,"
inElectrical, Electronics and Computer Science (SCEECS), 2014 IEEE Students' Conference
on, 2014, pp. 1-5.
35
[11] Abbas Malik Laurent Besacier Christian Boitet, “A Hybrid Model for Urdu Hindi
Transliteration” Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, pages
177–185, Suntec, Singapore, 7 August 2009.
[12] Gurpreet Singh Josan and Jagroop Kaur,” Punjabi to Hindi Statistical Machine
Transliteration”April 2011,Publisher: International Journal of Information Technology &
Knowledge Management (IJITKM)
[13] Andy Way, Sudip Kumar Naskar, Sandipan Dandapat, Ankit Kumar Srivastava, and
RejwanulHaque In 2009, CNGL put forth a Context-Informed Phrase-based statistical machine
translated for such an English to Hindi transcription.
[14] Amitava Das, Asif Ekbal, Tapabrata Mandal, and Sivaji Bandyopadhyay , Applying Letter-
to-Phenome technique tackled the transliteration issue in 2009.
[15] Taraka Rama, Karthik Gali,” Modeling Machine Transliteration as a Phrase Based
StatisticalMachine Translation Problem” Proceedings of the 2009 Named Entities Workshop,
ACL-IJCNLP2009, pages 124–127,Suntec, Singapore, 7 August 2009
36
APPENDIX-A
DATASET
अं an 3
अंकगणित ankganit 3
अंकल uncle 4
अंकु र ankur 4
अंकु रर ankuran3
अंकु ररत ankurit 3
अंकु श aankush 1
अंकु श ankush 3
अंग ang 2
अंग anga 1
अंगद agandh 1
अंगद angad 2
अंगने angane 3
अंगभंग angbhang 3
अंगरक्षक angarakshak 1
अंगरक्षक angrakshak 2
अंगारा angara 3
अंगारे angaare 1
अंगारे angare 2
अंगी angi 3
अंगीकार angikar 3
अंगुठे anguthe 3
अंगुल angul 3
अंगुिलय ंं anguliyon 3
अंगुली anguli 2
अंगुली ungli 1
अंगूठा angutha 3
अंगूिठय aanguthiyon 1
अंगूिठय anguthiyon 2
अंगूठी anguthi 3
अंगूठे anguthe 2
अंगूठे anguthon 1
अंगूर angoor 1
दु आओं duaon 2
37
दु कानदार dukaandaar 1
दु कानदार dukaardaar 1
दु कानदार dukandar 1
दु कानदारी dukaandaari 2
दु कानदारी dukandari 1
दु कानदार ंं dukaandaaron 2
दु कानदार ंं dukandaron 1
दु खद dukhad 5
दु खदाई dukhdaai 2
दु खदाई dukhdae 1
दु खने दु खांत dukhne 3 dukhaant
2
दु खां त दु गना दु गना दु गनी दु गनी दु गनी दु गुनी दु ग्ध दु ग्ध
दु धवा dukhant dugana 1
dugna 2
dugani 1
dugni 1
duguni 1
duguni 3
dugdh 2
dugdha 1 dudhavaa 2
दु धवा dudhwa 2
दु धारू dudhaaroo 1
दु धारू dudhaaru 1
दु धारू दु िनया
दु िनया dudharu duniya 2
duniyaa 2
दु िनयां duniyaa 1
दु िनयां duniyan 2
णिद्धि णिद्धि
णिद्धियााााँ siddhi 6
sidhi 1 siddhiyan
णिद्धिय ंं siddhiyon 2
णिद्धिय ंं siddiyon 1
णिद्धििवनायक siddhivinayak 3
णिद् धू siddhu 1 णिद् धू sidhu 2 णिणं ंं siddhon 3
णिधदां त ंं siddhanton 3
णिन scene 1
णिन sin 2
णिने cine 2
38
णिने sine 1
णिनेमा cinema 5
णिनेमा sinema 1
णिनेमाघर cinemaghar 3
णिनेमाघर ंं cinemagharon 3
णिन्हा sinha 4
णिप्पी sippy 3
णिप्ला cipla 2
णिप्ला sipla 1
णिफर sifar 2
णिफर siphar 2
हल hal 3
हलचल halchal 2
हलचल hulchul 1
हलद्वानी haldwani 3
हलफनामा halafnama
हलफनामा halfnama
हलाल halal 3
हल्क hulk 3
हल्का halka 3
हल्का hulka 1
हल्की halki 3
हल्की hulki 1
हल्के halke 3
हल्के hulke 1
हल्दीराम haldiram 2
हल्द्द्वानी haldrani 1
हल्द्द्वानी haldwani 2
हवन havan 2
हवन hawan 1
हवलदार havaldar
हवलदार hawaldar
हवा hava 1
हवा hawa 5
हवाई havai 1
हवाई hawai 2
39
APPENDIX-B
SOURCE CODE
# Required packages
import math import random
import numpy as np import pandas as pd import tensorflow as tf from keras import backend
from random import randrange from tensorflow import keras from google.colab import files
import matplotlib.pyplot as plt import matplotlib.ticker as ticker
from tensorflow.python.keras.models import load_model from tensorflow.python.keras.callbacks
import EarlyStopping
# Downloading dataset
!wget https://storage.googleapis.com/gresearch/dakshina/dakshina_dataset_v1.0.tar
!tar -xf 'dakshina_dataset_v1.0.tar'
# Fixed parameter
batch_size = 64
# We are using "tab" as the "start sequence" and "\n" as "end sequence".
target_data = "\t" + target_data + "\n" train_input_data.append(input_data)
train_target_data.append(target_data)
# adding space
input_data_characters.add(" ") target_data_characters.add(" ")
# sorting
input_data_characters = sorted(list(input_data_characters)) target_data_characters =
sorted(list(target_data_characters))
decoder_input_data,
decoder_target_data,batch_size,epochs):
encoder_inputs = keras.Input(shape=(None,), name='encoder_input') encoder = None
encoder_outputs = None state_h = None
state_c = None
e_layer= n_encoder_layers
# GRU
elif cell_type=="GRU":
embed = tf.keras.layers.Embedding(input_dim=n_encoder_tokens,
output_dim=embedding_size,name='encoder_embedding')(encoder_inputs)
encoder = keras.layers.GRU(latent_dimension, return_state=True,
return_sequences=True,name='encoder_hidden_1', dropout=dropout)
encoder_outputs, state_h = encoder(embed)
# Inference Model
encoder_inputs = model.input[0]
41
encoder_outputs, state_h_enc = model.get_layer('encoder_hidden_' +
str(n_encoder_layers)).output encoder_states = [state_h_enc]
encoder_model = keras.Model(encoder_inputs, encoder_states) decoder_inputs = model.input[1]
decoder_outputs = model.get_layer('decoder_embedding')(decoder_inputs)
decoder_states_inputs = []
decoder_states = []
for j in range(1, n_decoder_layers + 1):
decoder_state_input_h = keras.Input(shape=(latent_dimension,))
# LSTM
elif cell_type=="LSTM":
embed = tf.keras.layers.Embedding(input_dim=n_encoder_tokens,
output_dim=embedding_size,name='encoder_embedding')(encoder_inputs)
encoder = keras.layers.LSTM(latent_dimension, return_state=True,
return_sequences=True,name='encoder_hidden_1', dropout=dropout)
encoder_outputs, state_h, state_c = encoder(embed)
for i in range(2,e_layer+1):
layer_name = ('encoder_hidden_%d') % i
encoder = keras.layers.LSTM(latent_dimension, return_state=True,
return_sequences=True,name=layer_name, dropout=dropout)
encoder_outputs, state_h, state_c = encoder(encoder_outputs, initial_state=[state_h,state_c])
def accuracy(val_encoder_input_data,
val_target_data,n_decoder_layers,encoder_model,decoder_model, verbose=False):
correct_count = 0
total_count = 0 n_val_data=len(val_encoder_input_data) for seq_idx in range(n_val_data):
# parameters
embedding_size=256
# n_encoder_tokens=num_encoder_tokens # n_decoder_tokens=num_decoder_tokens
n_encoder_layers=3
n_decoder_layers=3 latent_dimension=512 cell_type='LSTM'
# target_token_idx=target_token_idx
# decoder_max_length=decoder_max_length
# reverse_target_char_index=reverse_target_char_index dropout=0.3
epochs=2
#Best model
encoder_model, decoder_model=seq2seq(embedding_size,
num_encoder_tokens,num_decoder_tokens,n_encoder_layers,
n_decoder_layers,latent_dimension,
cell_type, target_token_idx, decoder_max_length,reverse_target_char_index, dropout
,encoder_input_data, decoder_input_data,decoder_target_data,batch_size,epochs)
# encoder_model.save('encoder_model.h5') # decoder_model.save('decoder_model.h5')
# val_accuracy= accuracy(val_encoder_input_data,
val_target_data,n_decoder_layers,encoder_model,decoder_model) # print('Validation accuracy:
', val_accuracy)
# val_accuracy = accuracy(val_encoder_input_data[0:subset],
val_target_data[0:subset],n_decoder_layers,encoder_model,decoder_model) if subset>0 \ #
else accuracy(val_encoder_input_data,
val_target_data,n_decoder_layers,encoder_model,decoder_model)
print('Validation accuracy: ', val_accuracy)
# embedding test
test_input_data = [] test_target_data = []
for line in test_lines[: (len(test_lines) - 1)]: target_text, input_text, _ = line.split("\t") target_text
= "\t" + target_text + "\n" test_input_data.append(input_text)
test_target_data.append(target_text)
Seq2Seq model
#LSTM(Seq2Seq)
def build_model(cell = "LSTM",units = 32, enc_layers = 1, dec_layers = 1,embedding_dim =
32,dense_size=32,dropout=None):
keras.backend.clear_session() encoder_inputs = Input(shape=(None,)) encoder_embedding =
Embedding(input_dim=len(english_tokens)+1,output_dimembedding_dim,mask_zero=True)
encoder_context = encoder_embedding(encoder_inputs)
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(input_dim = len(hindi_tokens)+1,output_dim =
embedding_dim,mask_zero=True)
decoder_context = decoder_embedding(decoder_inputs) if cell == "LSTM":
encoder_prev = [LSTM(units,return_sequences=True) for i in range(enc_layers-1)] encoder_fin
= LSTM(units,return_state=True
decoder = [LSTM(units,return_sequences=True,return_state=True) for i in range(dec_layers)
#gru
elif cell == "GRU":
encoder_prev = [GRU(units,return_sequences=True) for i in range(enc_layers-1)] encoder_fin =
GRU(units,return_state=True)
temp = encoder_context for lay in encoder_prev:
temp,s = decoder[0](decoder_context,initial_state=state) for i in range(1,dec_layers):
temp,s = decoder[i](temp,initial_state=state)
elif cell == "RNN":
encoder_prev = [SimpleRNN(units,return_sequences=True) for i in range(enc_layers-1)]
encoder_fin = SimpleRNN(units,return_state=True)
temp = encoder_context for lay in encoder_prev:
temp = lay(temp)
for i in range(1,dec_layers)
dense_lay1 = Dense(dense_size,activation='relu') pre_out = dense_lay1(temp)
final_output = dense_lay2(pre_out)
decoder_model = Model(decoder_input_pass, [final_output]+state_outputs) return
train,encoder_model,decoder_model
# Config is a variable
train,enc,dec = build_model(units=units,
44
dense_size=dense_size, enc_layers=enc_layers, dec_layers=dec_layers, cell = cell,
dropout = dropout,
embedding_dim = embedding_dim) train.compile(optimizer =
Adam(learning_rate=learning_rate),loss='categorical_crossentropy',metrics=['accuracy'])
# Early Stopping
earlyStopping = EarlyStopping(monitor='val_loss', patience=5, verbose=0, mode='min')
save_best_only=True) train.fit([trainx,trainxx],trainy,
batch_size=batch_size, validation_data = ([valx,valxx],valy), epochs=10,
callbacks=[WandbCallback(), earlyStopping,checkpoint])
sweep_config = {
'method': 'random', #grid, random 'metric': {
'name': 'val_accuracy', 'goal': 'maximize'},
'parameters': { 'learning_rate': {
'values': [0.01, 0.001]},
'dense_size': {
'values': [64,128,512]},
'dropout': {
'values': [0.0,0.2,0.4]},
'units': {
'values': [64,128,256]},
'batch_size': {
'values': [64,128,256]},
'cell': {
'values': ["LSTM","GRU","RNN"]},
'embedding_size': { 'values': [64,128,256]},
'enc_layers': { 'values': [1,2,3]},
'dec_layers': { 'values': [1,2,3]},
sweep_id = wandb.sweep(sweep_config, entity="addy_15", project="v")
id = '0whn8jb2'
wandb.agent(id, train,entity="addy_15", project="v")
train,enc,dec = build_model(units=256,dense_size=512,enc_layers=2,dec_layers=3,cell =
"GRU", embedding_dim = 64)
train.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
checkpoint = train.fit([trainx,trainxx],trainy,
batch_size=128, validation_data=([valx,valxx],valy), epochs=5,
callbacks = [checkpoint])
46
APPENDIX-C
47