Report Further Page
Report Further Page
Abstract
An AI desktop assistant that talks like humans is a system that can perform tasks and provide
different services to the individual as per the individual’s dictated commands. This is done through
a synchronous process involving recognition of speech patterns and then, responding via synthetic
speech. The most famous example of this is Apple Inc.’s “SIRI” which helps the end user to
communicate with mobile devices using voice to perform actions by delegating requests
accordingly. Same kind of application is also developed by Google that is “Google Voice Search”
which is used in Android Phones. But these applications mostly work with Internet services. A
desktop assistant with voice recognition intelligence, which takes the user input in the form of
voice, processes it and returns the output in various forms like answer questions, make
recommendations, and perform actions, etc. as per the end user. In the modern world, Artificial
Intelligence (AI) has rapidly evolved to become an integral part of our daily lives. AI-based
personal assistants have gained popularity due to their ability to perform tasks efficiently and
simplify human-computer interactions. The primary goal is to bridge the communication gap
between humans and machines, creating a more engaging and satisfying user experience. The
assistant will leverage the latest advancements in Natural Language Processing (NLP) and AI
technologies. Keywords: SIRI, Google Voice Search, Web- browser, Internet, Speech recognition.
1
Vikram – AI Voice Desktop Assistant
List of Figures
List of Tables
2
Vikram – AI Voice Desktop Assistant
Chapter 1
INTRODUCTION
In today’s era almost all tasks are digitalized. We have Smartphone in hands and it is nothing less
than having world at your finger tips. These days we aren’t even using fingers. We just speak of the
task and it is done.AI voice assistant, also known as a virtual or digital assistant, is a device that
uses voice recognition technology, natural language processing, and Artificial Intelligence (AI)to
respond to people. Virtual assistants, understand natural language voice commands and performs
tasks for users. The AI assistant can also perform other activities such as read news and weather
updates, open Google, You Tube, time, play mu-sic and open and close any apps and websites, etc.
This system is designed to be used efficiently on desktops. Personal assistant software improves
user productivity by managing routine tasks of the user and by providing information from online
sources to the user. Vikram is effortless to use. Call the wake-up word ‘Vikram’ followed by the
command and within seconds, it gets executed. This project was started on the premise that there is
sufficient amount of openly available data and information on the web that can be utilized to build a
virtual assistant that has access to making intelligent decisions for routine user activities.
3
Vikram – AI Voice Desktop Assistant
Despite the availability of multiple virtual assistants, their usage remains limited, particularly due to
issues in voice recognition. Many struggle to understand English spoken with non-native accents,
such as the Indian accent. While these assistants are optimized for mobile devices, desktop integration
is lacking. Therefore, there is a need for a desktop-based virtual assistant that accurately
understands English in an Indian accent. Furthermore, these assistants often fail to answer questions
correctly due to lack of context or intent recognition, requiring continuous optimization and large
amounts of data for efficient performance.
1.2 Fundamentals
The development of an AI desktop assistant relies on several key technologies and concepts that make
the interaction between humans and machines more intuitive. These are:
Natural Language Processing (NLP): NLP is the field of AI that enables machines to
understand, interpret, and respond to human language. This allows the assistant to
understand voice commands or typed instructions and provide meaningful responses.
Speech Recognition: This technology converts spoken words into text that the AI system
can understand. Libraries like Google’s Speech Recognition or Microsoft’s Speech SDK are
often used to build this feature.
Machine Learning (ML): ML models are used to improve the assistant's ability to
understand user input over time. As more interactions occur, the system learns and improves
the accuracy of its responses.
Task Automation: The assistant is designed to execute specific tasks like opening files,
controlling system settings, or sending emails, enhancing productivity and convenience.
Voice Synthesis (Text-to-Speech): To provide responses audibly, text-to-speech
technologies are used, allowing the assistant to "speak" back to the user with the help of
voice generation tools.
1.3 Objectives
The scope of this AI desktop assistant project encompasses the following areas:
Core Functionalities: The assistant is built to perform tasks like speech-to-text conversion,
text-to-speech responses, web searches, file handling, and basic automation of system tasks
like opening applications.
User Interaction: The project focuses on creating a smooth and intuitive interaction
experience for users through voice and text commands.
Language Processing: The assistant will use NLP to understand user commands accurately
and respond accordingly.
Task Management: It can manage routine tasks such as setting reminders, organizing to-do
lists, and controlling media or system functions.
Limitations: The project, in its current state, will support only English and will have limited
integration with external applications. Future versions could include support for more
languages, APIs, and advanced features such as learning from user behaviour.
5
Vikram – AI Voice Desktop Assistant
Chapter 2
Literature
Survey
2.1 Introduction:
This chapter provides a review of the literature on AI desktop assistant systems, particularly
focusing on methods implemented using Python. It covers a variety of approaches, including speech
recognition systems, neural networks, and hybrid models. Each technique is evaluated based on its
advantages and limitations, offering insights into the current challenges and areas for improvement.
By summarizing the state of the art in AI assistants, this review aims to identify existing gaps and
opportunities for further research and development in creating more efficient, responsive, and
adaptable desktop assistants.
The techniques discussed in this review are selected based on their relevance to the development of
intelligent desktop assistants, with a focus on enhancing performance, improving user interaction, and
ensuring scalability. The chapter concludes with a summary of the findings, highlighting potential
future directions for research in this domain.
1. ”AI Based Voice Assistant Using Python” by Deepak Shende, Ria Umahiya, Monika Raghorte- In this
paper, we discussed the design and implementation of a Digital Assistance. The project is built using open
source software modules with PyCharm community backing which can accommodate any updates in the near
future.
3. ”Voice Assistant Using Python and AI by Divisha Pandey, Afra Ali, Shweta Dubey, Muskan Srivastava-
The paper tells about the new emerging technology for the desktop users This new service is based on
internet of things, speech recognition and various other modern technologies like artificial intelligence,
natural language processing and deep learning.
4. ”Virtual Assistant Using Python” Vedant Kulkarni Department of Computer Engineering Maeer’s MIT
Polytechnic Pune- by The new framework has defeated the vast majority of the constraints of the current
framework and works as indicated by the plan detail given.The task what we have created is work all the
more effectively. The virtual assistant effectively takes voice inputs
Vikram – AI Voice Desktop Assistant
Combines neural
Requires large datasets to
networks and NLP to
improve accuracy and
2 improve context
responsiveness.
Rajat Sharma et al. [3] recognition.
Struggles with
Highly responsive to user
personalization, especially in
inputs in various
multi-user environments.
conditions.
Provides better
Requires high computational
performance in
power, especially for deep
4 recognizing accents,
learning tasks.
Vedant Kulkarni [5] including Indian accents.
The system still faces
Can automate a wide
challenges with handling multi-
range of desktop tasks,
user environments.
making it versatile.
Chapter 3
Project Overview
B . Project Deliverables
What’s being delivered (In Scope):
A desktop-based AI voice assistant, "Vikram," capable of recognizing English spoken
in Indian accents.
Core functionalities include:
o Voice command recognition and processing.
o Task execution like opening applications, playing music, browsing websites,
and checking weather/time.
o Integration with natural language processing (NLP) for improved
conversational abilities.
o Personalization features allowing users to customize commands and
assistant responses.
o Basic conversational AI using OpenAI or similar NLP APIs to handle user queries.
What’s not being delivered (Out of Scope):
Advanced AI features such as deep learning models for emotion recognition or
continuous learning.
Mobile integration or functionality outside of desktop systems.
Support for multiple languages beyond English.
Hardware development (e.g., dedicated smart devices).
Complex, multi-user conversations or enterprise-level system integration.
Clarifications Needed:
The specific desktop platforms supported (Windows, Linux, Mac).
The extent of customization possible by the end-user (e.g., can users create
custom commands?).
Any limitations on the number or type of tasks "Vikram" can handle simultaneously.
This model provides a clear understanding of the project’s scope, deliverables, and
assumptions, helping align project goals with user needs.
C. Project Constraints
1. Technical Constraints:
o Limited Indian accent training data may impact speech recognition accuracy.
o Hardware dependency: Microphone quality varies across desktop systems.
o Reliance on external APIs (OpenAI, speech recognition) can affect response
time, availability, or cost.
o Desktop-specific design limits adaptability to mobile platforms.
2. Resource Constraints:
o API usage costs may limit frequent or advanced queries.
o Team expertise in AI, voice recognition, and NLP may limit feature complexity.
3. Time Constraints:
o Strict development deadlines restrict time for extensive feature refinement.
o API/library updates may cause delays due to compatibility issues.
4. User Constraints:
o Software must run on a range of desktop specs, limiting features for low-
end systems.
o Assumes basic user knowledge of voice assistants.
5. Regulatory & Ethical Constraints:
o Privacy concerns necessitate strict data protection and user consent protocols.
o Data storage must comply with local privacy laws.
3.2 Timeline with milestones and Gannt Chart
Vikram – AI Voice Desktop Assistant
3 17-Aug-2024 22-Aug-2024 Sprint Planning: Set sprint goals and prioritize tasks.
Gannt Chart
Gannt Chart of AI Voice Desktop Assistant
3.3 Methodology
1. Environment Setup and Library Installation
Create a Virtual Environment: Isolate project dependencies using virtualenv or conda.
Install Required Libraries: Use pip to install essential libraries:
o SpeechRecognition: For speech-to-text conversion.
o pyttsx3 or gTTS: For text-to-speech conversion.
o OpenAI: For accessing OpenAI's API for advanced language models.
o wikipedia: For accessing Wikipedia's knowledge base.
o pyaudio: For audio input/output.
o Other libraries as needed for specific functionalities.
2. Speech Recognition and Text-to-Speech
Speech-to-Text:
o Use SpeechRecognition to capture audio input from a microphone.
o Process the audio to recognize spoken words and convert them into text.
Text-to-Speech:
o Employ pyttsx3 or gTTS to synthesize text into natural-sounding speech.
o Customize voice, speed, and pitch as needed.
Chapter 4
Vikram – AI Voice Desktop Assistant
Design of Algorithm
5. Task Execution
Execute Commands:
o Based on the identified command, execute the appropriate action:
Web Searches: Use web scraping or APIs to retrieve information.
Music Playback: Access the music library and play the requested track.
Time/Date Queries: Retrieve the current date and time.
Wikipedia Queries: Fetch summaries of articles using the Wikipedia API.
6. External API Integration
Real-Time Information:
o For commands requiring real-time data (e.g., weather updates, news), call
the appropriate external API.
Data Parsing and Formatting:
o Parse and format the retrieved data to ensure user-friendly output.
7. Response Generation
Generate Text Response:
o Create a response based on the results of the executed command.
Text-to-Speech Conversion:
o Use the text-to-speech system to convert the text response into speech.
8. User Feedback
Output Speech Response:
o Play the generated speech response back to the user.
Prompt for Further Input:
o Ask the user if they have more commands or questions.
9. Loop Back
Continuous Interaction:
o Return to the user input handling step to allow for ongoing interaction.
10. Exit Condition
o Recognize Exit Commands: Provide a mechanism for the user to exit the
assistant (e.g., by recognizing commands like "exit" or "quit").
Vikram – AI Voice Desktop Assistant
2. Microphone : The microphone in an AI desktop assistant is responsible for capturing user voice
commands, which are then processed through various stages, including analog-to-digital
conversion, speech recognition, natural language understanding, response generation, and finally,
user interaction.
3. Speech Recognition: Speech recognition, also known as automatic speech recognition (ASR) or
voice recognition, is a technology that enables a computer or machine to con vert spoken language
into text or commands. It’s a critical component of various applications and systems, including
voice assistants, transcription services, voice controlled devices, and more
5. Cloud Storage:
Vikram – AI Voice Desktop Assistant
Speech Recognition: Speech Recognition library is used for listening to the words
spoken by the users that is taken as input from microphone as a source and then process it
for finding out its meaning and convert them into text format. This library allows machine
system to understand the human language.
Pyttsx3: Pyttsx3 stands for Python text to speech library is used for making our voice
assistant talk to us. It supports common text to speech engines which is like a tool that
converts text into speech and makes voice assistant able to talk to its user. We can make
it talking in both male and female voices according to requirement.
Wikipedia: We need to use Wikipedia library so that we can get information from
Wikipedia on any topic or we can also ask for solution to our query or simply we can
perform Wikipedia search for any topic using this library. This Library in python needs
Internet connection for fetching results and it will provide results to user in text as well
as voice format.
Datetime: This is an essential module to support the functionality of Date and time.
Whenever user wants to know the current date and time or the user wants to schedule a
task at a certain time then this module will be helpful to them.
PyAutoGUI: PyAutoGUL is a Python Package which has control over the mouse and
the keyboard it is able to simulate the mouse cursor moves as well as clicks the button
press. With the help of particular 2-D coordinate we can click on exact location on
screen.
PyWhatkit: PyWhatKit is a Python Library which has number of features like Sending
messages, images through WhatsApp, playing YouTube videos, converting image to
ASCII, sending emails etc.
Keyboard: - Keyboard is library in Python which provides user the full control over
the Keyboard. Especially the ‘press ()’ and ‘write ()’ function helps for controlling
keyboard keys as well as writing messages on screen.
Speedtest library is essential to test internet bandwidth. It helps to evaluate the uploading
as well as downloading speed of Internet. All the result that we get are in Megabits.
OS: OS (Operating System) module in Python is used for interacting with operating
system. Particularly we are using the ‘Start file ()’ to open any application that are installed
in our system.
Vikram – AI Voice Desktop Assistant
Chapter 5
In this project there is only one user. The user queries command to the system. System then
interprets it and fetches answer. The response is sent back to the user.
Vikram – AI Voice Desktop Assistant
Initially, the system is in idle mode. As it receives any wake up call it begins execution. The
received command is identified whether it is a questionnaire or a task to be performed. Specific
action is taken accordingly. After the Question is being answered or the task is being performed,
the system waits for another command. This loop continues unless it receives quit command. At
that moment, it goes back to sleep.
Vikram – AI Voice Desktop Assistant
Software Specification
1. Operating System:
o Windows 10 or higher
2. Python:
o Version 3.6 or higher
3. IDE:
o PyCharm or any other Python IDE
4. Required Libraries:
o pyttsx3: Text-to-speech conversion
o SpeechRecognition: Voice command recognition
o pyPDF2: PDF reading capabilities
o smtplib: For sending emails
o pywhatkit: For WhatsApp messaging
o pyautogui: For automating keyboard and mouse tasks
o pyQt or similar: For GUI development
Vikram – AI Voice Desktop Assistant
Chapter 6
6.1 Conclusion
The AI desktop assistant represents a remarkable achievement in the field of Natural Language
Processing and human-computer interaction. By simulating human-like conversations, this
technology offers new possibilities for personal and professional use. However, ongoing research,
user feedback, and ethical considerations will be essential in shaping the future of such AI
assistants. An AI desktop assistant that talks like humans represents a remarkable advancement in
natural language processing and artificial intelligence technologies. By emulating human-like
conversation, such an assistant significantly enhances user interaction and usability, making it more
intuitive, engaging, and accessible .Moreover, a human-like AI desktop assistant has the potential
to transform industries like customer service, healthcare, education, and more. It can revolutionize
the way we interact with technology, creating more inclusive and user-friendly experiences for
people of all ages and backgrounds.
Vikram – AI Voice Desktop Assistant
The future scope for AI desktop assistants is vast and promising, as these intelligent virtual
companions continue to evolve and integrate with various aspects of our digital and physical lives.
Here are some key areas of future development and opportunities for AI desktop assistants:
References
1. Alotto, F., Scid` a, I., and Osello, A. (2020). “Building modeling with artificial intelligence
and speech recognition for learning purpose.” Proceedings of EDULEARN20 Conference, Vol. 6.
7th.
2. Beirl, D., Rogers, Y., and Yuill, N. (2019). “Using voice assistant skills in family
life.”ComputerSupported Collaborative Learning Conference, CSCL, Vol. 1, Inter-national
Society of the Learning Sciences, Inc. 96–103.
3. Canbek, N. G. and Mutlu, M. E. (2016). “On the track of artificial intelligence: Learning
with intelligent personal assistants.” Journal of Human Sciences, 13(1), 592–601. 13
4. Malodia, S., Islam, N., Kaur, P., and Dhir, A. (2021). “Why do people use artificial
intelligence (AI)-enabled voice assistants?.” IEEE Transactions on Engineering Management.
5. Nasirian, F., Ahmadian, M., and Lee, O.-K. D. (2017). “Ai-based voice assistant
systems: evaluating from the interaction and trust perspectives.
6. RAJA, K. D. P. R. A. (2020). “Jarvis ai using python.
7. Terzopoulos, G. and Satratzemi, M. (2019). “Voice assistants and artificial intelligence
in education.” Proceedings of the 9th Balkan Conference on Informatics. 1–6.
8. ”JARVIS”- AI Voice Assistant,Rajat Sharma , Adweteeya Dwivedi,International Journal
of Science and Research (IJSR)
Vikram – AI Voice Desktop Assistant
Acknowledgement
We would like to express our sincere gratitude to everyone who has supported us throughout this
project. Your invaluable guidance, constructive criticism, and friendly advice have been
instrumental to our success. We are particularly indebted to Dr. Umakant Gohatre, our project
guide, for his unwavering support, mentorship, and expertise. His guidance and constant
supervision, coupled with his provision of necessary information, have been invaluable. We are
also grateful to our overall major project coordinator, Dr. Umakant Gohatre, for her guidance
throughout the entire project process. We would like to extend our heartfelt thanks to our friends
and family for their unwavering support, encouragement, and the conducive environment they
provided for our project work. Their contributions, including their participation in literature
surveys, have been invaluable to our success.