Rayat Shikshan Sanstha’s
Karmaveer Bhaurao Patil College of Engineering, Satara
Department : Computer Science & Engineering
Academic Year: 2024-25 Semester-V
Under the Guidance of
Prof. Anuja Jadhav
by
1. Arundhati Avinash Gadekar (22)
2. Awantika Satish Karpe (20)
3. Karan Vishnu Gite (52)
4. Suraj Sanjay Patil (53)
Motivation
The development of Jarvis is driven by the need for efficient, voice-controlled
systems that simplify tasks and boost productivity. By automating tasks like
opening apps, sending messages, and browsing the web using voice
commands, Jarvis saves time and effort. It also aims to be a cost-effective
alternative to commercial assistants like Siri or Alexa, using free APIs and
open-source tools. The inclusion of facial recognition adds an extra layer of
security, making it both user-friendly and secure.
Case Study
1. Personal Productivity: A software developer uses Jarvis to multitask by
opening apps, sending WhatsApp messages, and browsing the web through
voice commands, increasing efficiency.
2.Accessibility: For visually impaired users, Jarvis offers hands-free
interaction with devices using voice commands, making tasks like messaging
and browsing easier.
3.Customer Service Automation: A small business owner uses Jarvis to
handle customer inquiries and automate WhatsApp messages, improving
business efficiency without high costs.
Introduction
In the rapidly evolving digital communication landscape, automation tools are
becoming essential for enhancing user efficiency. This mini project focuses on
developing an AI-based voice assistant specifically for WhatsApp automation,
allowing users to perform various tasks through simple voice commands
sss
A standout feature of this voice assistant is the integration of facial recognition
technology for secure authentication. Additionally, the assistant utilizes
natural language processing powered by ChatGPT, enabling it to understand
and respond to user queries conversationally. This feature enhances user
experience by providing intuitive support and facilitating seamless
communication.
Overall, this project represents a significant innovation in automating and
securing WhatsApp interactions, making digital communication more efficient
and user-friendly.
Literature Survey
Voice assistants have evolved rapidly due to advancements in artificial
intelligence (AI) and natural language processing (NLP). Early developments
such as IBM’s Shoebox (1961) and Dragon Dictate (1990s) paved the way
for modern AI-based assistants like Siri (Apple, 2011), Google Assistant
(2016), and Amazon Alexa (2014). These systems rely on speech recognition
and natural language understanding to interpret and respond to user
commands.
Speech Recognition Technologies: Systems like Google’s Speech-to-Text
API and Microsoft’s Azure Speech Services have improved accuracy using
deep learning models like Deep Neural Networks (DNNs) and Recurrent
Neural Networks (RNNs). Research has shown that integrating such models
with language models like Transformer-based architectures (e.g., BERT,
GPT) enhances understanding and response quality.
Literature Survey
Text-to-Speech (TTS) has evolved from early concatenative synthesis to
modern neural TTS models such as WaveNet, which delivers more natural-
sounding speech. These innovations help bridge human-computer interaction
by producing highly intelligible and natural voice output.
Face Recognition for authentication has gained traction in securing voice
assistants, using technologies like convolutional neural networks (CNNs)
and feature-matching techniques for robust user identification. This helps
personalize and secure interactions.
Recent studies focus on improving conversational AI through models like
GPT-3 and open-source alternatives such as DialoGPT, emphasizing their
role in making interactions more dynamic and human-like.
Existing Systems
Amazon Alexa Google Assistant Apple Siri
• Description: Amazon • Description: Google • Description: Siri is
Alexa is a cloud-based Assistant is an AI- Apple’s voice-activated
voice service available powered virtual assistant available on
on Amazon Echo and assistant developed by iOS devices, including
other Alexa-enabled Google. It is available iPhones, iPads, and
devices. It can perform on smartphones, smart Macs.
tasks such as controlling speakers, and other • Capabilities: Voice
lights, adjusting connected devices command recognition,
thermostats, and • Capabilities: Advanced natural language
managing entertainment speech recognition, understanding, and
systems. natural language seamless integration
• Capabilities: Voice processing, and with Apple’s ecosystem.
recognition, natural integration with Google
language understanding, services and third-party
and integration with devices.
numerous third-party
smart devices.
Objectives
Develop a functional AI-based
voice assistant that can execute
Implement WhatsApp
commands, including converting
automation to send messages
text to speech, recognizing
programmatically based on voice
speech, and performing actions
commands.
like opening applications and
websites.
Integrate a free ChatGPT API
alternative to provide
conversational AI capabilities
without incurring high costs.
s
Proposed System
Auto Classification Speech-to-text
(detect wake words) (transcribe query)
Spoken Query
Text-to –Speech Language model
(synthesis speech) (generate response)
Spoken Answer
- On device
- On the cloud
Proposed System
Frontend View of Voice Assistant
System Architecture
User Interface(UI)
(Voice & Text Inputs)
Speech Recognition Modules –
(Speech-to-Text)
User Interface (UI)
Speech Recognition (Speech-to-
Command Processing Module Text)
(Logic and Task Execution) Command Processing Module
Text-to-Speech (TTS)
Application and Website Control
Face WhatsApp Automation
App Whatsapp ChatGPT API Alternative
TTS Control Automation Authenti-
cation Face Authentication
chatGPT API Alternative
(conversational AI)
Advantages
Cost-Effective &
Helps to Blind
easily handle by
people to send
non-technical
messages
person also
Multi-Functional Improve response
and 24/7 time
Availability
Limitations and Future Scope
Limitations Future Scope
• Dependency on Devices • Integration with Additional
• Dependency on internet Services
• Language • Enhanced platform services
• User Personalization
• Multiple Language options
References
1. https://www.researchgate.net/publication/372394842_Development_o
f_AI-based_voice_assistants_using_Large_Language_Models
2. Artificial Intelligence-based Voice Assistant | IEEE Conference
Publication | IEEE Xplore
3. http://www.ijert.org
4. Voice Assistants: The Present and Future
5. A Comprehensive Review on Speech Emotion Recognition
6. Speech Synthesis with Transformers: A Review
Project Implementation