RV College of Engineering®
(Autonomous Institution Affiliated to VTU ,Belagavi)
REAL TIME LANGUAGE TRANSLATION
           Experiential Learning Report
                  Submitted by
            Hanisha R- 1RV22CV031
        Shreya Chakote - 1RV22CS189
      Vanshika Khandelwal- 1RV22CS224
     COMPUTER SCIENCE & ENGINEERING
  Contents
1. Introduction
2. Objectives
3. Methodology
4. Patents and papers
5. Implementations
8. Conclusion
INTRODUCTION
A transformative solution poised to revolutionize
communication on a global scale. These cutting-edge
spectacles employ state-of-the-art technology to
instantaneously interpret spoken language into the
wearer's native tongue, offering seamless communication
across linguistic boundaries.
Imagine traveling to foreign lands and effortlessly
conversing with locals, conducting business negotiations
with ease, or connecting with people from different
cultures without the constraints of language barriers.
These glasses redefine the way we communicate,
fostering greater understanding and cooperation among
people worldwide.
PATENT - US 2015/0120276 A1
LINK: https://patents.google.com/patent/US20150120276A1/en
METHODOLOGY AND COMPONENTS:
Hardware Components:
Frame: Holds all the components together.
Pair of Glasses: Lenses held by the frame, potentially translucent.
Input Unit: Allows user interaction, including buttons for taking
photos and selecting languages.
Camera Unit: Captures images of text that need translation.
Projection Device: Rotatably coupled to the frame, used to
project translated text.
Software Components:
Camera Control Module: Activates and controls the camera unit
to capture images of text.
Word Identification Module: Identifies the words shown in the
captured image.
Translation Module: Translates the identified words from the
initial language into the target language.
Display Control Unit: Controls the projection device to display
the translated words onto a surface.
User Interaction: Users interact with the intelligent glasses
through the input unit, which includes buttons for taking photos
and selecting target languages. When the user presses the photo
button, the camera unit captures an image of the text needing
translation.
METHODOLOGY:
Image Processing and Translation: The captured image
is processed by the camera control module and word
identification module to identify the text. The identified
text is then translated using the translation module from
the initial language to the target language selected by the
user.
Projection of Translated Text: Once translated, the
display control unit directs the projection device to
display the translated text onto a surface. This could
include projecting onto the lenses of the glasses
themselves or onto another surface, depending on the
design and user preferences.
Hardware and Software Integration: The hardware and
software components work together seamlessly to
provide the translation functionality in real-time,
allowing users to understand foreign text
instantaneously.
                          PAPER
LINK:https://www.ripublication.com/ijaer18/ijaerv13n9_90.pdf
METHODOLOGY AND COMPONENTS:
➔ The system proposed consists of four main modules:
   ● data acquisition
   ● pre-processing
   ● feature extraction
   ● sign recognition
   ● Two main approaches for sign language
      recognition are described:
1.Glove-based approaches
Glove-based approaches involve wearing a sensor glove,
simplifying the segmentation process, while vision-based
approaches use image processing algorithms to detect
and track hand signs and facial expressions.
2. Vision-based approaches
Vision-based methods are preferred due to their ease of
use and absence of additional hardware requirements,
but they may have accuracy issues that need to be
addressed.
➔ Principal Component Analysis (PCA) is mentioned as
  the main feature extraction technique used in the
    proposed system.
➔ ALGORITHM USED FOR SIGN RECOGNITION
● Linear Discriminant Analysis (LDA) algorithm is used
  for sign recognition, which involves dimensionality
  reduction and comparing gestures using Euclidean
  distance.
● Training and recognition phases are described,
  where gestures are represented as vectors,
  normalized, projected onto gesture space, and
  compared for recognition.
 OUR APPROACH AND IMPROVEMENTS
● Instantly translating spoken language, sign language,
  and written text into your desired language.
● Recognition technology accurately interpreting sign
  language gestures, providing real-time translation for
  both deaf and hearing individuals.
● Utilizing Optical Character Recognition (OCR)
  technology to translate written text, such as signs,
  menus, and documents, enhancing accessibility in
  diverse environments.
● Advanced speech recognition capabilities capture
  spoken words with precision, enabling accurate
  translation into the target language in real-time.
● User-friendly design with touch-sensitive controls and
  voice commands for easy operation, making it
  accessible to users of all technical backgrounds.
● Integrating with Wi-Fi or cellular networks for access
  to online translation services, expanding language
  options and improving translation accuracy.
         HARDWARE COMPONENTS
● Frame, pair of glasses
● Raspberry Pi 4 Model B
● Spoken Input : USB or I2S microphone
● Camera unit: To detect text and sign language
● Memory unit: language database
● Speakers to output the translated audio
         SOFTWARE COMPONENTS
● Operating system- installed in raspberry Pi
  like Raspberry Pi OS or Ubuntu Mate
● Speech Recognition Software
● Language Translation Software
● Audio Processing Software: for noise
  recognition, echo cancellation
● Connectivity Software: such as Wi-Fi or
  Bluetooth
● Gesture Recognition Software
● Gesture Database
                   WORKING
1.Initialising:
     Upon startup, the Raspberry Pi boots up and
     initializes all necessary hardware components,
    including the camera module, microphone,
    speaker/headphones, and display module.
2.Input Acquisition:
    The camera module captures real-time video
    feed of the user's hand gestures. The
    microphone captures spoken language input
    from the user.
3.Gesture Recognition:
   The captured video frames are processed using
   image processing algorithms implemented on
   the Raspberry Pi.Image segmentation
   techniques are applied to isolate the hand
   region from the background.Feature extraction
   algorithms, such as contour detection and
   keypoint extraction, are used to identify
   relevant hand gestures.
4.Speech Recognition:
   The captured spoken language input from the
   microphone is processed using speech
   recognition software libraries or services.The
   speech recognition algorithms convert the
   spoken language into text format, which serves
   as the input for the translation process.
5.Language Translation:
   The recognized sign language gestures and the
   transcribed spoken language text are input into
   the translation system. Language translation
   software, such as Google Translate API or
   Microsoft Translator, translates the spoken
   language text into the desired target
   language.For sign language translation, a
   database of sign language gestures mapped to
   corresponding spoken language translations is
   used to translate recognized gestures into text
   format.
6.Output Generation:
   The translated text, both from the spoken
   language and sign language inputs, is
   displayed on the wearable display module.
7.User Interaction:The user interacts with the
system through an app, adjusting modes and
receiving real-time feedback for a tailored
translation experience.
8.Feedback and Optimization:The system
provides feedback to the user in the form of visual
and auditory cues, confirming successful
translation and providing assistance in case of
errors or misunderstandings.
IMPLEMENTATION
Image:
Camera:
Text and audio:
CONCLUSION
  In conclusion, the implementation of the Image
  to Text and Text to Speech translator
  represents a significant step forward in
  bridging the gap between visual content and
  accessibility for individuals with visual
  impairments. Through the integration of
  OpenCV, pytesseract, and gTTS libraries, we
  have developed a robust system capable of
  extracting text from images and converting it
  into speech in a seamless manner.
  While      the     current    implementation
  demonstrates functional capabilities, there is
  scope for further refinement and expansion.
  Future iterations could focus on enhancing
  OCR       accuracy      through      advanced
preprocessing techniques, optimizing speech
synthesis for improved naturalness and clarity,
and exploring additional features such as
multilingual support and compatibility with
diverse image formats.
Overall, the Image to Text and Text to Speech
translator project underscores the potential of
technology to empower individuals with
disabilities, improve accessibility across
diverse contexts, and contribute to the creation
of more inclusive digital environments. With
continued innovation and refinement, such
systems have the capacity to make a
meaningful difference in the lives of users
worldwide.