0% found this document useful (0 votes)
31 views63 pages

Final 1 Report Ss

The document is a Project Stage II Report on a 'Voice-Activated Virtual Assistant' submitted for a Bachelor of Engineering degree in Artificial Intelligence and Machine Learning. It outlines the objectives, methodology, and expected outcomes of developing a Python-based virtual assistant that can perform tasks through voice commands, with a focus on assisting visually impaired individuals. The report also discusses the challenges and opportunities of virtual assistant technology in enhancing independent living for users.

Uploaded by

07Sakshi Bongale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views63 pages

Final 1 Report Ss

The document is a Project Stage II Report on a 'Voice-Activated Virtual Assistant' submitted for a Bachelor of Engineering degree in Artificial Intelligence and Machine Learning. It outlines the objectives, methodology, and expected outcomes of developing a Python-based virtual assistant that can perform tasks through voice commands, with a focus on assisting visually impaired individuals. The report also discusses the challenges and opportunities of virtual assistant technology in enhancing independent living for users.

Uploaded by

07Sakshi Bongale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 63

A

PROJECT STAGE II REPORT ON

“Voice-Activated Virtual Assistant”


SUBMITTED TO THE SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
AWARD OF THE DEGREE

BACHLOR OF ENGINEERING (Artificial Intelligence And Machine Learning)

BY

MS. Shirure Sakshi Namdev. EXAM NO: - B400990154


Mr. Talole Tejas Dilip. EXAM NO: - B400990156
MS. Argade Swati Babasaheb. EXAM NO: - B400990134
MS. Jadhav Snehal Sanjay. EXAM NO: - B400990245

UNDER THE GUIDANCE OF

Prof.Mrs.Khemnar K.C

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND


MACHINE LEARNING ENGINEERING
Sahyadri Valley College of Engineering and Technology, Rajuri A/P-Rajuri-412411,
Tal. Junnar, Dist. pune (MS) India
2024-2025

Savitribai Phule Pune University

CERTIFICATE

This is to Certify that

Mr. Tejas Talole Dilip Ms. Shirure Sakshi Namdev Ms.


Argade Swati Babasaheb MS. Jadhav Snehal Sanjay

Student of B.E. Artificial Intelligence and Machine Learning in


Project Stage II Report

“VOICE-ACTIVATED VIRTUAL ASSISTANT”


on .../... /2025

At
Department of Artificial Intelligence and Machine Learning,
Sahyadri valley College of Engineering and Technology Rajuri,
Rajuri - 412411

....................... .......................
Internal Examiner External Examiner
(Prof. Mrs Khemnar K.C.) (Prof. )

Certificate By Guide
This is to certify that [BE-004] and group members are MS Shirure Sakshi Namdev,
Mr. Talole Talole Dilip, MS.Argade Swati Babasaheb, MS.Jadhav Snehal Sanjay has
completed the Project Stage II work under my guidance and supervision and that, I
have verified the work for its originality in documentation, problem statement,
implementation and results presented in the dissertation. Any reproduction of other
necessary work is with the prior permission and has given due ownership and
included in the references.

Place: Rajuri

Date: (Prof.Mrs Khemnar K.C)

ACKNOWLEDGEMENT

“Feeling gratitude and not expressing it, is like wrapping a present and not giving it”
The satisfaction that accompanies the successful completion of any task would be incomplete
without mentioning the people who made it possible. So, I am grateful to number of
individuals whose professional guidance along with encouragement have made it very
pleasant endeavor to undertake this project.
I take this opportunity to express my profound gratitude and deep regards to my
guide and truly teacher of teachers Prof.Mrs Khemnar K.C. for their exemplary
guidance, monitoring and constant encouragement throughout the course of this project. I
also express a deep sense of gratitude to Prof.Mrs Khemnar K.C, Head of AI&ML
Engineering Department for their valuable guidance and encouragement.
I am obliged to our Principal Dr S.B.Zope and vice principle Prof P.Balaramudu for
their inspiration and co-operation.
Last but not least I have to express our feelings towards all staff members of Sahyadri
Valley College of Engineering and special thanks to my family, colleague and friends for
their moral support and help.

Ms. Shirure Sakshi Namdev.


Mr. Talole Tejas Dilip.
Ms. Argade Swati Babasaheb.
Ms. Jadhav Snehal Sanjay .
SVCET,Rajuri.

LIST OF PUBLICATIONS
List of Tables

4.1 Implementation Plan................................................................Error: Reference source not found


ABSTRACT

A virtual assistant is a software agent that can perform tasks or services for an individual.
Sometimes the term "chatbot" is used to refer to virtual assistants generally or specifically
those accessed by online chat. Virtual Assistant (VA) is a term that applies to computer-
simulated environments that can simulate physical presence in places in the real world, as
well as in imaginary worlds. This report discusses ways in which new technology could be
harnessed to create an intelligent Virtual Personal Assistant (VPA) with a focus on user-
based information. This project is a technical brief on Virtual Assistant technology and its
opportunities and challenges in different areas. This project also describes the challenges of
applying virtual Assistant technology. In today's advanced hi-tech world, the need of
independent living is recognized in case of visually impaired people who are facing main
problem of social restrictiveness. They suffer in strange surroundings without any manual
aid. Visual information is the basis for most tasks, so visually impaired people are at
disadvantage because necessary information about the surrounding environment is not
available. With the recent advances in inclusive technology, it is possible to extend the
support given to people with visual impairment.

Keywords: Voice-Activated Virtual Assistant, Artificial Intelligence, Natural Language


Processing, Speech Recognition, Machine Learning, User Interaction, Smart Technology

INDEX
Acknowledgement..............................................................................................................................3

List of Publications.......................................................................Error: Reference source not found


List of Tables..................................................................................Error: Reference source not found
Abstract...........................................................................................Error: Reference source not found
Index.....................................................................................................................................................7
Synopsis................................................................................................................................................9
1 INTRODUCTION.....................................................................Error: Reference source not found
1.1 Aim and purpose.......................................................................Error: Reference source not found
1.2 Scope of the project..................................................................Error: Reference source not found
1.3 Objectives of the project..........................................................Error: Reference source not found
2 LITERATURE SURVEY........................................................Error: Reference source not found
2.1 Survey on Virtual Assistant: Google Assistant, Siri, AlexaError: Reference source not found
2.2 Survey on Smart Virtual Voice Assistant.............................Error: Reference source not found
2.3 Survey on Personal Voice Assistant......................................Error: Reference source not found
2.4 Survey on Personal Desktop Virtual Voice Assistant using Python.Error: Reference source
not found

3 METHODOLOGY....................................................................Error: Reference source not found


3.1 Overview of existing system...................................................Error: Reference source not found
3.2 Proposed System.......................................................................Error: Reference source not found
3.3 Objective of Project.................................................................Error: Reference source not found
4 IMPLEMENTATION PLAN.................................................Error: Reference source not found
4.1 Data Flow Diagram..................................................................Error: Reference source not found
4.1.1 Level 0 Data Flow Diagram.............................................Error: Reference source not found

4.1.2 Level 1 Data Flow Diagram.............................................Error: Reference source not found

4.1.3 ER Diagram........................................................................Error: Reference source not found

4.2 Implementation Plan...................................................................................................................21


5 SOFTWARE REQUIREMENT SPECIFICATION...........................................................22
5.1 Libraries :...................................................................................Error: Reference source not found
5.1.1 pyttsx3 :.............................................................................Error: Reference source not found

5.1.2 speech recognition............................................................Error: Reference source not found


5.1.3 wikipedia............................................................................Error: Reference source not found
5.1.4 webbrowser.......................................................................Error: Reference source not found
5.1.5 datetime..............................................................................Error: Reference source not found
5.1.6 time.....................................................................................Error: Reference source not found
5.1.7 requests...............................................................................Error: Reference source not found
5.1.8 pywhatkit...........................................................................Error: Reference source not found
5.1.9 pyautogui...........................................................................Error: Reference source not found
5.2 Programming Languages:..........................................................................................................18
5.2.1 python...................................................................................................................................18
5.3 Types of Operation:....................................................................................................................18
5.3.1 information...........................................................................................................................18
6 SYSTEM DESIGN..................................................................Error: Reference source not found
6.1 Requirement Analysis :............................................................Error: Reference source not found
6.2 System Architecture:................................................................Error: Reference source not found
6.3 Use Case Diagram:...................................................................Error: Reference source not found
7 OUTCOME.................................................................................Error: Reference source not found
8 CODING SECTION.................................................................Error: Reference source not found
9 OUTPUT......................................................................................Error: Reference source not found
10 SYSTEM OVERVIEW..........................................................Error: Reference source not found
10.1 Advantages..........................................................................Error: Reference source not found
10.2 Disadvantages of Existing System...................................Error: Reference source not found
10.3 Features:...............................................................................Error: Reference source not found
10.3 Future Scope:.......................................................................Error: Reference source not found
11 CONCLUSION........................................................................Error: Reference source not found
12 OTHER DOCUMENTATION.............................................Error: Reference source not found
12.1 Based Paper:.........................................................................................................................120
12.2 Published Paper:...................................................................................................................120
12.3 Plagiarism:............................................................................................................................120

SYNOPSIS

Title: Voice-Activated Virtual Assistant in Python

Objective:
1. To develop a Python-based virtual assistant capable of performing various tasks based on
voice commands, including web searches, YouTube interactions, and Wikipedia queries,
while providing spoken feedback to the user.
2. Voice Command Execution: To develop a virtual assistant that can accurately recognize
and execute various user commands given via speech, such as performing web searches,
playing media on YouTube, and retrieving information from Wikipedia.
3. Interactive Feedback: To provide real-time, spoken feedback to the user based on the
recognized commands, ensuring a smooth and engaging interaction through the text-tospeech
functionality.

Abstraction:
A virtual assistant is a software agent that can perform tasks or services for an individual.
Sometimes the term "chatbot" is used to refer to virtual assistants generally or specifically
those accessed by online chat. Virtual Assistant (VA) is a term that applies to computer-
simulated environments that can simulate physical presence in places in the real world, as
well as in imaginary worlds. This report discusses ways in which new technology could be
harnessed to create an intelligent Virtual Personal Assistant (VPA) with a focus on user-
based information. This project is a technical brief on Virtual Assistant technology and its
opportunities and challenges in different areas. The project focuses on virtual assistant types
and structural elements of a virtual assistant system. In this project, we tried to study virtual
Environment and virtual Assistant Interfaces, and the paper presents applications of virtual
assistant that helps in providing opportunities for humanity in various domains. This project
also describes the challenges of applying virtual Assistant technology. In today's advanced hi-
tech world, the need of independent living is recognized in case of visually impaired people
who are facing main problem of social restrictiveness. They suffer in strange surroundings
without any manual aid. Visual information is the basis for most tasks, so visually impaired
people are at disadvantage because necessary information about the surrounding environment
is not available. With the recent advances in inclusive technology, it is possible to extend the
support given to people with visual impairment.

Problem Statement:
In today’s fast-paced world, users are increasingly looking for hands-free, quick, and
convenient ways to interact with their devices. A voice-activated assistant can provide a
solution by allowing users to perform tasks, retrieve information, and control other
applications or devices using only their voice. The project aims to develop a Python-based
voice assistant that listens to user commands, interprets the intent, and executes the requested
action.

Hardware Requirements
1. A computer or laptop with a working microphone
2. Speakers or headphones for audio output

Software Requirements
1. Python 3.x
2. Libraries: speech recognition, pyttsx3, Wikipedia, web-browser, PyAutoGUI, time,
Pywhatkit

Methodology
1. Initialization: The script initializes the speech recognition and text-to-speech engines.
2. Speech Recognition: Captures and converts spoken words into text using the speech
recognition library.
3. Command Handling: Analyses the text to determine the appropriate action, including:
• Opening a Google or YouTube search result.
• Playing a YouTube video.
• Retrieving Wikipedia summaries.
4. Feedback: Uses text-to-speech to provide responses and feedback to the user.
• Loop: Continuously listens for commands until a termination
command ("good bye") is detected.

Data flow diagram (DFD):


I. DFD (level 0):

II.DFD (level 1):

Expected Outcomes
1. The virtual assistant will accurately recognize and respond to predefined voice commands.
2. Users will be able to perform Google searches, YouTube searches and playback, and
retrieve Wikipedia information through voice commands.
3. The assistant will provide spoken feedback, ensuring an interactive experience.

Conclusion:
The script demonstrates the feasibility of creating a functional virtual assistant using
Python with minimal dependencies. By integrating speech recognition and text-to-speech
functionalities, the script offers a practical tool for users seeking a basic voice-controlled
assistant. Future improvements could include more sophisticated command parsing,
enhanced error handling, and additional functionalities.

Chapter 1

INTRODUCTION
Voice assistants are artificial intelligence (AI) systems that enable users to interact with
devices and perform tasks using natural language voice commands. Voice assistants have
become increasingly popular in recent years, with many people using them to control smart
devices, access information, and perform a variety of tasks on their smartphones, smart
speakers, and other devices.
Voice assistants use natural language processing (NLP) algorithms and machine learning
techniques to understand and respond to user requests. They can be activated using a specific
trigger word or phrase, such as "Hey Siri" or "Ok Google," and can perform a wide range of
tasks, such as answering questions, setting reminders, playing music, or controlling smart
home devices.
Voice assistants have the potential to make many everyday tasks more convenient and
efficient, as they allow users to interact with devices and systems using their voice rather than
requiring them to use a physical interface or input commands manually. However, voice
assistants also raise privacy and security concerns due to the sensitive personal data that they
may collect, store, and process.
Overall, voice assistants are an emerging and rapidly evolving technology that has the
potential to transform how people interact with devices and systems, and they will likely
continue to play an important role in the development of AI and the internet of things (IoT) .

1.1 Aim and purpose


Purpose of virtual assistant is to being capable of voice interaction, music playback, making
to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather,
traffic, sports, and other real-time information, such as news.
Virtual assistants enable users to speak natural language voice commands in order to operate
the device and its apps. There is an increased overall awareness and a higher level of comfort
demonstrated specifically by millennial consumers. In this ever-evolving digital world where
speed, efficiency, and convenience are constantly being optimized, it’s clear that we are
moving towards less screen interaction.

1.2 Scope of the Project


Virtual Assistants will continue to offer more individualized experiences as they get better at
differentiating between voices. However, it's not just developers that need to address the
complexity of developing for voice as brands also need to understand the capabilities of each
device and integration and if it makes sense for their specific brand. They will also need to
focus on maintaining a user experience that is consistent within the coming years as
complexity becomes more of a concern.

This is because the visual interface with virtual assistants is missing. Users simply cannot see
or touch a voice interface. Virtual Assistants are software programs that help you ease your
day-40-day tasks, such as showing weather report, playing music etc. They can take
commands via text (online chat bots) or by voice.

1.3 OBJECTIVE OF PROJECT


Main objective of building personal assistant software (a virtual assistant) is using semantic
data sources available on the web, user generated content and providing knowledge from
knowledge databases. The main purpose of an intelligent virtual assistant is to answer
questions that users may have. This may be done in a business environment, for example, on
the business website, with a chat interface. On the mobile platform, the intelligent virtual
assistant is available as a call-button operated service where a voice asks the user “What can I
do for you?” and then responds to verbal input. Virtual assistants can tremendously save you
time. We spend hours in online research and then making the report in our terms of
understanding. Provide a topic for research and continue with your tasks while the assistant
does the research. Another difficult task is to remember test dates, birthdates or anniversaries.
It comes with a surprise when you enter the class and realize it is class test today. Just tell
assistant in advance about your tests and she reminds you well in advance so you can prepare
for the test. One of the main advantages of voice searches is their rapidity. In fact, voice is
reputed to be four times faster than a written search: whereas we can write about 40 words
per minute, we are capable of speaking around 150 during the same period of time. In this
respect, the ability of personal assistants to accurately recognize spoken words is a
prerequisite for them to be adopted by consumers

CHAPTER 2

LITERATURE SURVEY

2.1 Survey on Virtual Assistant: Google Assistant, Siri, Alexa

Authors: Amrita S. Tulshan and Sudhir Namdeorao Dhage


Virtual assistant is boon for everyone in this new era of 21st century. It has paved way for a
new technology where we can ask questions to machine and can interact with IVAs as people
do with humans. This new technology attracted almost whole world in many ways like smart
phones, laptops, computers etc. Some of the significant VPs are like Siri, Google Assistant,
Cortana, and Alexa. Voice recognition, contextual understanding and human interaction are
the issues which are not solved yet in this IVAs. So, to solve those issues100 users
participated a survey for this research and shared their experiences. All users‟ task was to ask
questions from the survey to all personal assistants and from their experiences this research
paper came up with the actual results. According to that results many services were covered
by these assistants but still there are some improvements required in voice recognition,
contextual under-standing and hand free interaction. After addressing these improvements in
IVAs will definitely increase its use is the main goal for this research paper.

2.2 Survey on Smart Virtual Voice Assistant

Authors: Manjusha Jadhav, Krushna Kalyankar, Gnaesh Narkhede and Swapnil Kharose

In this modern era, day to day life became smarter & interlinked with technology. We already
know some voice assistant like Google, Siri etc. Now in our voice assistant system, it can act
as your smart friend, daily schedule manager, to do writer, calculator & search tool. This
project works on speech input & give output through speech & text on screen. This assistant
attaches with the world wide web to provide result that the user required. Natural language
processing algorithm helps machines to engage in communication using natural human
language in many forms.

2.3 Survey on Personal Voice Assistant

Authors: S. Lahari, A. Naveen, G. Sarath Chandra

Digitization brings new possibilities to ease our daily life activities by the means of assistive
technology. Amazon Alexa, Apple Siri, Microsoft Cortana, Samsung Bixby, to name only a
few were successful in the age of smart personal assistants (spas). A voice assistant is defined
a digital assistant that combines artificial intelligence, machine learning Speech Recognition,
Natural Language Processing (NLP), Speech Synthesis and various actuation mechanisms to
sense and influence the environment. We use different NLP techniques to convert Speech to
text (STT), then process the text, convert Text to Speech (TTS), add various functionalities.
However, SPA research seems to be highly fragmented among different disciplines, such as
computer science, human-computer-interaction and information systems, which leads to
reinventing the wheel approaches‟ and thus impede progress and conceptual clarity. In this
paper, we present an exhaustive, integrative literature review to build a solid basis for future
research. Hence, we contribute by providing a consolidated, integrated view on prior research
and lay the foundation for a SPA classification scheme.

2.4 Survey on Personal Desktop Virtual Voice Assistant using Python

Authors: Prof. Suresh V. Reddy, Chandresh Chhari, Prajwal Wakde, Nikhil Kamble

In today‟s develop generation, how cool is it to build your own personal assistants like Alexa
or Siri? It‟s not very complex and may be effortlessly performed in Python. Personal virtual
assistants are capturing numerous attentions lately. Chat bots are not unusual in maximum
business web sites. The predominant agenda of our voice help makes human beings clever
and supply immediate and computed effects. The fundamental mission of a voice assistant is
to reduce using enter gadgets like keyboard, mouse, touch pens, and so forth. This will lessen
both the hardware fee and space taken by it.

CHAPTER 3

Methodology
3.1 Overview of existing system

A virtual voice assistant is a software program that utilizes natural language processing and
voice recognition technologies to understand and respond to spoken commands and queries.
It allows users to interact with their devices, applications, and services using voice
commands, and can perform a wide range of tasks such as making phone calls, scheduling
appointments, setting reminders, and providing information. Some popular examples of
virtual voice assistants include Amazon Alexa, Google Assistant, and Apple Siri. These AI
powered systems can be integrated with other devices and services to create a more seamless
and convenient user experience.

3.2 Proposed System

We are proposing a system in an efficient way of implementing a Personal voice assistant,


Speech Recognition library has many in-built functions, that will let the assistant understand
the command given by user and the response will be sent back to user in voice, with Text to
Speech functions. When assistant captures the voice command given by user, the under lying
algorithms will convert the voice into text. And according to the keywords present in the text
(command given by user), respective action will be performed by the assistant. This is made
possible with the functions present in different libraries. Also, the assistant was able to
achieve all the functionalities with help of some API‟s. We had used these APIs for
functionalities like performing calculations, extracting news from web sources, and for telling
the weather. We will be sending a request, and through the API, we‟re getting the respective
output. API‟s like WOLFRAMALPHA, are very helpful in performing things like
calculations, making small web searches. And for getting the data from web. In this way, we
are able to extract news from the web sources, and send them as input to a function for
further purposes. Also, we have libraries like Random and many other libraries, each
corresponding to a different technology. We used the library OS to implement Operating
System related functionalities like Shutting down a system, or restarting a system.
At the outset we make our program capable of using system voice with the help of sapi5 and
pyttsx3. pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries,
it works offline, and is compatible with both Python 2 and 3. The Speech Application
Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech
recognition and speech synthesis within Windows applications. Then we define the speak
function to enable the program to speak the outputs. After that we will define a function to
take voice commands using the system microphone. The main function is then defined where
all the capabilities of the program are defined.
The proposed system will have the following functionality: (a) The system will keep listening
for commands and the time for listening is variable which can be changed according to user
requirements. (b) If the system is not able to gather information from the user input it will
keep asking again to repeat till the desired number of times. (c) The system can have both
male and female voices according to user requirements. (d) Features supported in the current
version include playing music, texts, search on Wikipedia, or opening system installed
applications, opening anything on the web browser, etc.

3.3 Objective of Project

Main objective of building personal assistant software (a virtual assistant) is using semantic
data sources available on the web, user generated content and providing knowledge from
knowledge databases. The main purpose of an intelligent virtual assistant is to answer
questions that users may have. This may be done in a business environment, for example, on
the business website, with a chat interface. On the mobile platform, the intelligent virtual
assistant is available as a call-button operated service where a voice asks the user “What can I
do for you?” and then responds to verbal input. Virtual assistants can tremendously save you
time. We spend hours in online research and then making the report in our terms of
understanding. Provide a topic for research and continue with your tasks while the assistant
does the research. Another difficult task is to remember test dates, birthdates or anniversaries.
It comes with a surprise when you enter the class and realize it is class test today. Just tell
assistant in advance about your tests and she reminds you well in advance so you can prepare
for the test. One of the main advantages of voice searches is their rapidity. In fact, voice is
reputed to be four times faster than a written search: whereas we can write about 40 words
per minute, we are capable of speaking around 150 during the same period of time. In this
respect, the ability of personal assistants to accurately recognize spoken words is a
prerequisite for them to be adopted by consumers.

Chapter 4

IMPLEMENTATION PLAN
Planning plays an important role in successful completion of a project. This plan
acts as a checklist of the task done. It helps in stating the amount of work to be done
in a stipulated period of time. It can possible to judge ourselves against the time
chart and can set milestones for our project.
Block Diagram:
4.1 Data Flow Diagram

Block Diagram:
4.1.1 Level 0 data flow diagram

Figure 4.1.1 Level 0 Data Flow Diagram

4.1.2 Level 1 data flow diagram


Figure 4 .2.2 Level 1 Data Flow Diagram
4.1.3 ER Diagram
4.2 Implementation Plan

Months Schedule Project Task


July 2024 1st Week Idea about Project Selection/Project Topic
Selection.
July 2024 2nd Week 3rd Submission of Project Synopsis/Abstract.
Week Guide Allocation and Literature Survey
July 2024 4th Week Discussion with guide about Project.
Literature Survey. First Presentation with
guide about idea of projects.

August 2024 1st & 2nd Week Requirement Analysis(SRS Document)


Preparation and Submission. Design of
Project(Mathematical Model, UML
Diagrams.)

August 2024 3rd & 4th Week Presentation with Design. Preparation of
preliminary report.
September 1st Project Stage-I. Coding (At least 2 Module
2024 finish (30%) of total work)
September 2n & 3rd Week =First demonstration on project work
d expected(60%) of total work.
2024
September 4th Week Discussion with Project Guide. Test plan,
2024 Design and Installation
Octomber 1st & 2nd Week Completion of remaining work and any
2024 changes suggested from Project Guide. Final
Project Demonstration

Octomber 3rd & 4rd Week Preparation of Project Stage 1 report,


2024 Installation and Manual preparation.
Submission of Report(Project Stage 1)

Table 4.2: Implementation Plan

Chapter 5
SOFTWARE REQUIREMENT SPECIFICATION

5.1 Libraries:
5.1.1 Pyttsx3-
It is a text to speech conversion library in python which is used to convert the text given in
the
parenthesis to speech. It is compatible with python 2 and 3. An application invokes the
pyttsx3.init() factory function to get a reference to a pyttsx3. it is a very easy to use tool
which
converts the entered text into speech. The pyttsx3 module supports two voices first is female
and the second is male which is provided by “sapi5” for windows.
Command to install: - pip install pyttsx3
It supports three TTS engines:
- sapi5- To run on windows nsss
- NSSpeechSynthesizer on Mac OS X espeak
– eSpeak on every other platform

5.1.2 Speech_recognition-
It allows computers to understand human language. Speech recognition is a machine's ability
to listen to spoken words and identify them. We can then use speech recognition in Python to
convert the spoken words into text, make a query or give a reply. Python supports many
speech
recognition engines and APIs, including Google Speech Engine, Google Cloud Speech API.
Command to install: - pip install SpeechRecognition.

5.1.4 Wikipedia: -
This is a Python library that makes it easy to access and parse data from Wikipedia. Search
Wikipedia, get article summaries, get data like links and images from a page, and more.
Wikipedia is a multilingual online encyclopedia.
Command to install: - pip install Wikipedia

5.1.5 Webbrowser-
Webbrowser module is a convenient web browser controller. It provides a high-level
interface
that allows displaying Web-based documents to users. webbrowser can also be used as a CLI
tool. It accepts a URL as the argument with the following optional parameters: -n opens the
URL in a new browser window, if possible, and -t opens the URL in a new browser tab. This
is a built-in module so installation is not required.

5.1.6 Datetime-
This module is used to get the date and time for the user. This is a built-in module so there is
no need to install this module externally. Python Datetime module supplies classes to work
with date and time. Date and datetime are an object in Python, so when we manipulate them,
we are actually manipulating objects and not string or timestamps.
5.1.7 Time-
This module provides many ways of representing time in code, such as objects, numbers, and
strings. It also provides functionality other than representing time, like waiting during code
execution and measuring the efficiency of our code. This is a built-in module so the
installation
is not necessary.
5.1.8 Requests-
The requests module allows you to send HTTP requests using Python. The HTTP request
returns a Response Object with all the response data. With it, we can add content like
headers,
form data, multipart files, and parameters via simple Python libraries. It also allows you to
access the response data of Python in the same way.
Command to install: - pip install requests.

5.1.9 Pywhatkit -
Python's pywhatkit module. As we know, Python provides numerous libraries and pywhatkit
is
one of them. The pywhatkit module is used to send the message by the Python script. Using
this module, we can send the message to the desired number with a few lines of code.
It uses WhatsApp web to send these messages.

5.1.10 Pyautogui: -
PyAutoGUI is a cross-platform Python module that allows you to programmatically control
the
mouse and keyboard, making it a powerful tool for automating tasks within graphical user
interfaces (GUIs)

5.2 PROGRAMMING LANGUAGES

5.2.1. PYTHON
Python is an OOPs (Object Oriented Programming) based, high level, interpreted programming
language. It is a robust, highly useful language focused on rapid application development (RAD).
Python helps in easy writing and execution of codes. Python can implement the same logic with as
much
as 1/5th code as compared to other OOPs languages. Python provides a huge list of benefits to all. The
usage of Python is such that it cannot be limited to only one activity. Its growing popularity has
allowed
it to enter into some of the most popular and complex processes like Artificial Intelligence (AI),
Machine Learning (ML), natural language processing, data science etc. Python has a lot of libraries
for
every need of this project. For this project, libraries used are speech recognition to recognize voice,
Pyttsx for text to speech, selenium for web automation etc.

It‟s owing to the subsequent strengths that Python has-


• Easy to be told and perceive- The syntax of Python is simpler; thence it's
comparatively straightforward, even for beginners conjointly, to be told and perceive
the language.
• Multi-purpose language − Python could be a multi-purpose programming language
as a result of it supports structured programming, object-oriented programming yet as
practical programming.
• Support of open supply community − As being open supply programming
language,
Python is supported by awfully giant developer community. Because of this, the bugs
square measure simply mounted by the Python community. This characteristic makes
Python terribly strong and adaptative.

5.3 TYPES OF OPERATION: -


5.3.1 Information:
If we ask for some information, it opens up wikipedia and asks us the topic on which we want
the information, then it clicks on the wikipedia search box using its xpath, searches the topic
in
the search box and clicks the search button using the xpath of the button and reads a
paragraph
about that topic.

Keyword: information

• News of the day:


If we ask for the news, it reads out the Indian news of the day on which it is asked.
Keyword: news.

• Temperature and Weather:


If the user asks the temperature, it gives the current temperature.
Keyword: temperature.

• Date and Time:


If the user asks for the date or time, the assistant tells it. Keyword: date or time
or date and time.

• Tells its name:


The assistant tells its name if the user asks it. The name of the assistant is Next
Gen Optimal Assistant JARVIS.

Keyword: Name

Chapter 6
SYSTEM DESIGN
6.1 Requirement Analysis
In order to effectively design and develop a system, it is important to understand and
document
the requirements of the system. The process of gathering and documenting the requirements
of
a system is known as requirement analysis. It helps to identify the goals of the system, the
stakeholders and the constraints within which the system will be developed. The
requirements
serve as a blueprint for the development of the system and provide a reference point for
testing
and validation.

● Hardware Requirements
• Processor – 2.3 GHz or more
• RAM – 4 GB or more
• Disk Space – 50 GB or more
• Input Devices – Microphone & Keyboard
• Output Devices – Speaker & Monitor
• Internet Connection

● Software Requirements
• Python 3.12
• APIs
• News API
• WolframAlpha API
• OpenWeatherMap API
• TMDB API
• DreamStudio API

6.2 System Architecture


6.3 Use Case Diagram

Figure 4.4.1 Use Case Diagram (Voice Assistant)


Chapter 7

OUTCOMES

The outcomes of a virtual voice assistant project can be quite broad and can vary depending
on
the goals and specific implementation of the assistant. Here are some key outcomes that such
a project could yield:

1.Enhanced User Experience-


Personalized Interaction: Voice assistants can provide a tailored experience by recognizing
individual users, adapting responses, and remembering preferences.
- Convenience and Accessibility: By enabling hands-free interactions, voice assistants can
make services more accessible, particularly for those with disabilities.
- Faster Query Resolution: Users can get quick answers to common questions, enhancing
satisfaction and reducing wait times for human assistance.

2.Operational Efficiency
Automated Routine Tasks: Assistants can handle tasks like scheduling, reminders, and FAQs,
reducing the workload on support staff.
- 24/7 Availability: A virtual assistant can operate around the clock, providing support
outside
of business hours and increasing efficiency.
- Reduced Costs: With automation, organizations can save on labor costs, as the need for
human intervention for routine tasks decreases.

3.Data Collection and Insights


- User Behaviour Analysis: Voice assistants collect valuable data on user interactions, which
can help organizations understand common user needs and behaviours.
- Improved Decision-Making: Data analytics from voice interactions can guide business
decisions, such as product improvements, customer service training, or marketing strategies.

4. Increased Engagement and Retention


- Enhanced Loyalty: A seamless, engaging experience can build brand loyalty, encouraging
users to return.
- Multichannel Integration: By integrating with other platforms and devices, virtual assistants
create an ecosystem that keeps users engaged across multiple touchpoints.

5. Innovation and Competitive Advantage


- Differentiation: Offering a voice assistant can set an organization apart from competitors
and improve brand perception as a tech-savvy, innovative company.
- Continuous Improvement: Regular updates and improvements to the assistant keep it
relevant, making it a dynamic asset that evolves with user expectations.
Chapter 8

CODING SECTION
import asyncio
from random import randint
from PIL import Image
import requests
from dotenv import get_key
import os
from time import sleep

# Function to open and display images based on a given prompt


def open_images(prompt):
folder_path = r"Data" # Folder where the images are stored
prompt = prompt.replace(" ", "_")

# Generate the filenames for the images


Files = [f"{prompt}{i}.jpg" for i in range(1, 5)]

for jpg_file in Files:


image_path = os.path.join(folder_path, jpg_file)

try:
# Try to open and display the image
img = Image.open(image_path)
print(f"Opening image: {image_path}")
img.show()
sleep(1) # Pause for 1 second before showing the next image

except IOError:
print(f"Unable to open {image_path}")

# API details for the Hugging Face Stable Diffusion model


API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-xl-base-1.0"
headers = {"Authorization" : f"Bearer {get_key('.env', 'HuggingFaceAPIKey')}"}

# Async function to send a query to the Hugging Face API


async def query(payload):
response = await asyncio.to_thread(requests.post, API_URL, headers=headers, json=payload)
return response.content

# Async function to generate images based on the given prompt


async def generate_images(prompt: str):
tasks = []

# Create 4 image generation tasks


for _ in range(4):
payload = {
"inputs": f"{prompt}, quality=4K, sharpness=maximum, Ultra High details, high resolution,
seed = {randint(0, 1000000)}",
}
task = asyncio.create_task(query(payload))
tasks.append(task)
# Wait for all tasks to complete
image_bytes_list = await asyncio.gather(*tasks)

# Save the generated images to files


for i, image_bytes in enumerate(image_bytes_list):
with open(fr"Data\{prompt.replace(' ','_')}{i + 1}.jpg", "wb") as f:
f.write(image_bytes)

# Wrapper function to generate and open images


def GenerateImages(prompt: str):
asyncio.run(generate_images(prompt)) # Run the async image generation
open_images(prompt) # Open the generated images

# Main loop to monitor for image generation requests


while True:

try:
# Read the status and prompt from the data file
with open(r"Frontend\Files\ImageGeneration.data", "r") as f:
Data: str = f.read()

Prompt, Status = Data.split(",")

# If the status indicates an image generation request


if Status == "True":
print("Generating Images...")
ImageStatus = GenerateImages(prompt=Prompt)

# Reset the status in the file after generating images


with open(r"Frontend\Files\ImageGeneration.data", "w") as f:
f.write("False,False")
break # Exit the loop after processing the request

else:
sleep(1) # Wait for 1 second before checking again

except:
pass

#############################################################################
# Import required libraries
from AppOpener import close, open as appopen # Import functions to open and close apps.
from webbrowser import open as webopen # Import web browser functionality.
from pywhatkit import search, playonyt # Import functions for Google search and YouTube playback.
from dotenv import dotenv_values # Import dotenv to manage environment variables.
from bs4 import BeautifulSoup # Import BeautifulSoup for parsing HTML content.
from rich import print # Import rich for styled console output.
from groq import Groq # Import Groq for AI chat functionalities.
import webbrowser # Import webbrowser for opening URLS.
import subprocess # Import subprocess for interacting with the system.
import requests # Import requests for making HTTP requests.
import keyboard # Import keyboard for keyboard-related actions.
import asyncio # Import asyncio for asynchronous programming.
import os # Import os for operating system functionalities.

from googlesearch import search

# Load environment variables from the .env file.


env_vars = dotenv_values(".env")
GroqAPIKey = env_vars.get("GroqAPIKey") # Retrieve the Groq API key.

# Define CSS classes for parsing specific elements in HTML content.


classes = ["zCubwf", "hgKElc", "LTKOO sY7ric", "ZØLcW", "gsrt vk_bk FzvWSb YwPhnf",
"pclqee", "tw-Data-text tw-text-small tw-ta",
"IZ6rdc", "05uR6d LTKOO", "vlzY6d", "webanswers-webanswers_table__webanswers-table",
"dDoNo ikb4Bb gsrt", "sXLaOe",
"LWkfKe", "VQF4g", "qv3Wpe", "kno-rdesc", "SPZz6b"]

# Define a user-agent for making web requests.


useragent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/100.0.4896.75 Safari/537.36'

# Initialize the Grog client with the API key.


client = Groq(api_key=GroqAPIKey)

# Predefined professional responses for user interactions.


professional_responses = [
"Your satisfaction is my top priority; feel free to reach out if there's anything else I can help you
with. ",
"I'm at your service for any additional questions or support you may need-don't hesitate to ask.",
]

# List to store chatbot messages.


messages = []

# System message to provide context to the chatbot


SystemChatBot = [{"role": "system", "content": f"Hello, I am {os.environ['Username']}, You're a
content writer. You have to write content like letters, codes, applications, essays, notes, songs, poems
etc."}]

# Function to perform a Google search.


def GoogleSearch(Topic):
search(Topic) # Use pywhatkit's search function to perform a Google search.
return True # Indicate success.

# Function to generate content using AI and save it to a file.


def Content(Topic):

# Nested function to open a file in Notepad.


def OpenNotepad(File) :
default_text_editor = "notepad.exe" # Default text editor.
subprocess.Popen([default_text_editor, File]) # Open the file in Notepad.

# Nested function to generate content using the AI chatbot.


def ContentWriterAI(prompt):
messages.append( {"role": "user", "content": f"{prompt}"}) # Add the user's prompt to
messages.
completion = client.chat.completions.create(
model="llama-3.3-70b-versatile", # Specify the AI model.
messages=SystemChatBot + messages, # Include system instructions and chat history.
max_tokens=1024, # Limit the maximum tokens in the response.
temperature=1, # Adjust response randomness.
top_p=1, # Use nucleus sampling for response diversity.
stream=True, # Enable streaming response.
stop=None # Allow the model to determine stopping conditions.
)

Answer = "" # Initialize an empty string for the response.

# Process streamed response chunks.


for chunk in completion:
if chunk.choices[0].delta.content: # Check for content in the current chunk.
Answer += chunk.choices[0].delta.content # Append the content to the answer.

Answer = Answer.replace("</s>", "") # Remove unwanted tokens from the response.


messages.append({"role": "assistant", "content": Answer}) # Add the AI's response to messages.
return Answer

Topic: str = Topic.replace("Content", "") # Remove "Content " from the topic.
ContentByAI = ContentWriterAI(Topic) # Generate content using AI.

# Save the generated content to a text file.


with open(rf"Data\{Topic.lower().replace(' ','')}.txt", "w", encoding="utf-8") as file:
file.write(ContentByAI) # Write the content to the file.
file.close()

OpenNotepad(rf"Data\{Topic.lower().replace(' ','')}.txt") # Open the file in Notepad.


return True # Indicate success.

# Function to search for a topic on YouTube.


def YouTubeSearch(Topic):
Url4Search = f"https://www.youtube.com/results?search_query={Topic}" # Construct the
YouTube search URL.
webbrowser.open(Url4Search) # Open the search URL in a web browser.
return True # Indicate success.

# Function to play a video on YouTube.


def PlayYoutube(query):
playonyt(query) # Use pywhatkit's playonyt function to play the video.
return True # Indicate success.

# Function to open an application or a relevant webpage.

def OpenApp(app, sess=requests.session()):

try :
appopen(app, match_closest=True, output=True, throw_error=True) # Attempt to open the app.
return True # Indicate success.

except :
# Nested function to extract links from HTML content.
def extract_links(html):
if html is None:
return []
soup = BeautifulSoup(html, 'html.parser' ) # Parse the HTML content.
links = soup.find_all('a', {'jsname' : 'UWckNb'}) # Find relevant links.
return [link.get('href') for link in links] # Return the links.

# Nested function to perform a Google search and retrieve HTML.


def search_google(query):
try:
first_link = next(search(query)) # Get the first search result
print(f"Opening: {first_link}")
webbrowser.open(first_link) # Open the link in the default browser
except Exception as e:
print(f"Error: {e}")
# url = f"https://www.google.com/search?q={query}" # Construct the Google search URL.
# headers = {"User-Agent": useragent} # Use the predefined user-agent.
# response = sess.get(url, headers=headers) # Perform the GET request.

# if response.status_code == 200:
# return response.text # Return the HTML content.
# else :
# print("Failed to retrieve search results.") # Print an error message.
# return None

html = search_google(app) # Perform the Google search.

# if html:
# link = extract_links(html)[0] # Extract the first link from the search results.
# webopen(link) # Open the link in a web browser.
# return True

# if html:
# links = extract_links(html)
# if links: # Ensure we have valid links before accessing index 0
# webopen(links[0]) # Open the first search result
# else:
# print(f"No valid links found for '{app}', opening Google search instead.")
# webopen(f"https://www.google.com/search?q={app}") # Open Google search results
page
# else:
# print("Failed to retrieve Google search results.")

return True # Indicate success.


# OpenApp("notepad")
# OpenApp("youtube")

# Function to close an application.


def CloseApp(app):

if "chrome" in app:
pass # Skip if the app is Chrome.
else:
try:
close(app, match_closest=True, output=True, throw_error=True) # Attempt to close the app.
return True # Indicate success.
except :
return False # Indicate failure.

# Function to execute system-level commands.


def System(command) :

# Nested function to mute the system volume.


def mute():
keyboard.press_and_release("volume mute") # Simulate the mute key press.

# Nested function to unmute the system volume.


def unmute():
keyboard.press_and_release("volume unmute") # Simulate the unmute key press.

# Nested function to increase the system volume.


def volume_up():
keyboard.press_and_release("volume up") # Simulate the volume up key press

# Nested function to decrease the system volume.


def volume_down():
keyboard.press_and_release("volume down") # Simulate the volume down key press.

# Execute the appropriate command.


if command == "mute" :
mute()
elif command == "unmute":
unmute()
elif command == "volume up" :
volume_up()
elif command == "volume down" :
volume_down()

return True # Indicate success.

# Asynchronous function to translate and execute user commands.


async def TranslateAndExecute(commands:list[str]) :

funcs = [] # List to store asynchronous tasks.

for command in commands :


if command.startswith("open"): # Handle "open" commands.
if "open it" in command: # Ignore "open it" commands.
pass
if "open file" == command: # Ignore "open file" commands.
pass
else:
fun = asyncio.to_thread(OpenApp, command.removeprefix("open")) # Schedule app
opening.
funcs.append(fun)

elif command.startswith("general"): # Placeholder for general commands.


pass

elif command.startswith("realtime"): # Placeholder for real-time commands.


pass
elif command.startswith("close"): # Handle "close" commands
fun = asyncio.to_thread(CloseApp, command.removeprefix("close")) # Schedule app closing.
funcs.append(fun)

elif command.startswith("play"): # Handle "play" commands.


fun = asyncio.to_thread(PlayYoutube, command.removeprefix ("play")) # Schedule YouTube
playback.
funcs.append(fun)

elif command.startswith("content"): # Handle "content" commands.


fun = asyncio.to_thread(Content, command.removeprefix("content")) # Schedule content
creation.
funcs.append(fun)

elif command.startswith("google search"): # Handle Google search commands.


fun = asyncio.to_thread(GoogleSearch, command.removeprefix("google search")) # Schedule
Google search.
funcs.append(fun)

elif command.startswith("youtube search "): # Handle YouTube search commands.


fun = asyncio.to_thread( YouTubeSearch, command.removeprefix( "youtube search")) #
Schedule YouTube search.
funcs.append(fun)

elif command.startswith("system"): # Handle system commands.


fun = asyncio.to_thread(System, command.removeprefix("system")) # Schedule system
command.
funcs.append(fun)

else:
print(f"No Function Found. For {command}") # Print an error for unrecognized commands.

results = await asyncio.gather(*funcs) # Execute all tasks concurrently.

for result in results: # Process the results.


if isinstance(result, str) :
yield result
else:
yield result

# Asynchronous function to automate command execution.


async def Automation(commands:list[str]) :

async for result in TranslateAndExecute(commands): # Translate and execute commands.


pass
return True # Indicate success.

# if __name__ == "__main__":
# asyncio.run(Automation(["open facebook", "open instagam", "open telegram", "play bewajah",
"content resigntion letter"]))

############################################################################
from groq import Groq # importing the groq library to use its api
from json import load, dump # imporing functions to read and write json files
import datetime # importing the database module for real-time date and time information
from dotenv import dotenv_values # importing dotenv_values to read environment variables from
a .env files

# load environment variables from the .env file


env_vars = dotenv_values(".env")

# Retrieve specific environment variables for username, assistant name, and API key.
Username = env_vars.get ("Username")
Assistantname = env_vars.get("Assistantname")
GroqAPIKey = env_vars.get("GroqAPIKey")

# Initialize the Groq client using the provided API key.


client = Groq(api_key=GroqAPIKey)

# Initialize an empty list to store chat messages.


messages = []

# Define a system message that provides context to the AI chatbot about its role and behavior.
System = f"""Hello, I am {Username}, You are a very accurate and advanced AI chatbot named
{Assistantname} which also has real-time up-to-date information from the internet.
*** Do not tell time until I ask, do not talk too much, just answer the question.***
*** Reply in only English, even if the question is in Hindi, reply in English.***
*** Do not provide notes in the output, just answer the question and never mention your training data.
***
"""

# A list of system instructions for the chatbot.


SystemChatBot = [
{"role": "system", "content": System}
]

# Attempt to load the chat log from a JSON file.


try:
with open(r"Data\ChatLog.json", "r") as f:
messages = load(f) # Load existing messages from the chat log.
except FileNotFoundError:
# If the file doesn't exist, create an empty JSON file to store chat logs.
with open(r"Data\ChatLog. json", "w") as f:
dump([], f)

# Function to get real-time date and time information.


def RealtimeInformation():
current_date_time = datetime.datetime.now() # Get the current date and time.
day = current_date_time.strftime("%A") # Day of the week.
date = current_date_time.strftime("%d") # Day of the month.
month = current_date_time.strftime ("%B") # Full month name.
year = current_date_time.strftime("%Y") # Year.
hour = current_date_time.strftime ("%H") # Hour in 24-hour format.
minute = current_date_time.strftime ("%M") # Minute.
second = current_date_time.strftime("%S") # Second.

# Format the information into a string.


data = f"Please use this real-time information if needed, \n"
data += f"Day: {day}\nDate: {date}\nMonth: {month } \nYear : {year}\n"
data += f"Time: {hour} hours : {minute} minutes : {second} seconds. \n"
return data

# Function to modify the chatbot's response for better formatting.


def AnswerModifier(Answer):
lines = Answer.split('\n') # Split the response into lines.
non_empty_lines = [line for line in lines if line.strip()] # Remove empty lines.
modified_answer = '\n'.join(non_empty_lines) # Join the cleaned lines back together.
return modified_answer

# Main chatbot function to handle user queries.


def ChatBot(Query):
"""This function sends the user's query to the chatbot and returns the AI's response."""

try:
# Load the existing chat log from the JSON file.
with open(r"Data\ChatLog.json", "r") as f:
messages = load(f)

# Append the user's query to the messages list.


messages.append( {"role": "user", "content": f"{Query}"})

# Make a request to the Grog API for a response.


completion = client.chat.completions.create(
model="llama3-70b-8192", # Specify the AI model to use.
messages=SystemChatBot + [{"role": "system", "content": RealtimeInformation()}] +
messages, # Include system instructions, real-time info, and chat history.
max_tokens=1024, # Limit the maximum tokens in the response.
temperature=0.7, # Adjust response randomness (higher means more random).
top_p=1, # Use nucleus sampling to control diversity.
stream=True, # Enable streaming response.
stop=None # Allow the model to determine when to stop.
)

Answer = "" # Initialize an empty string to store the AI's response.

# Process the streamed response chunks.


for chunk in completion:
if chunk.choices[0].delta.content: # Check if there's content in the current chunk.
Answer += chunk.choices[0].delta.content # Append the content to the answer.

Answer = Answer.replace("</s>", "") # Clean up any unwanted tokens from the response.

# Append the chatbot's response to the messages list.


messages.append( {"role": "assistant", "content": Answer} )

# Save the updated chat log to the JSON file.


with open(r"Data\ChatLog.json", "w") as f:
dump(messages, f, indent=4)

# Return the formatted response.


return AnswerModifier(Answer=Answer)
except Exception as e :
# Handle errors by printing the exception and resetting the chat log.
# Handle errors by printing the exception and resetting the chat log.
print(f"Error: {e}")
with open(r"Data\ChatLog.json", "w") as f:
dump([], f, indent=4)
return ChatBot (Query) # Retry the query after resetting the log.

# Main program entry point.


if __name__ == "__main__":
while True:
user_input = input("Enter Your Questions: ") # prompt the user for a question.
print(ChatBot(user_input)) # call the chatbot function and print its response

#################################################################################

from selenium import webdriver


from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from dotenv import dotenv_values
import os
import mtranslate as mt

# Load environment variables from the .env file.


env_vars = dotenv_values(".env")
# Get the input language setting from the environment variables.
InputLanguage = env_vars.get ("InputLanguage")

# Define the HTML code for the speech recognition interface.


HtmlCode = '''<!DOCTYPE html>
<html lang="en">
<head>
<title>Speech Recognition</title>
</head>
<body>
<button id="start" onclick="startRecognition()">Start Recognition</button>
<button id="end" onclick="stopRecognition()">Stop Recognition</button>
<p id="output"></p>
<script>
const output = document.getElementById('output');
let recognition;

function startRecognition() {
recognition = new webkitSpeechRecognition() || new SpeechRecognition();
recognition.lang = '';
recognition.continuous = true;

recognition.onresult = function(event) {
const transcript = event.results[event.results.length - 1][0].transcript;
output.textContent += transcript;
};
recognition.onend = function() {
recognition.start();
};
recognition.start();
}

function stopRecognition() {
recognition.stop();
output.innerHTML = "";
}
</script>
</body>
</html>'''

# Replace the language setting in the HTML code with the input language from the environment
variables.
HtmlCode = str(HtmlCode).replace("recognition.lang = '';", f"recognition.lang =
'{InputLanguage}' ;")

# Write the modified HTML code to a file.


with open(r"Data\Voice.html", "w") as f:
f.write(HtmlCode)

# Get the current working directory.


current_dir = os.getcwd()
# Generate the file path for the HTML file.
Link = f"{current_dir}/Data/Voice.html"

# Set Chrome options for the WebDriver.


chrome_options = Options()
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/89.0.142.86 Safari/537.36"
chrome_options.add_argument(f'user-agent={user_agent}')
chrome_options.add_argument ("--use-fake-ui-for-media-stream")
chrome_options.add_argument("--use-fake-device-for-media-stream")
chrome_options.add_argument ("--headless=new")
# Initialize the Chrome WebDriver using the ChromeDriverManager.
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)

# Define the path for temporary files.


TempDirPath = rf"{current_dir}/Frontend/Files"

# Function to set the assistant's status by writing it to a file.


def SetAssistantStatus(Status):
with open(rf'{TempDirPath}/Status.data', "w", encoding='utf-8') as file:
file.write(Status)

# Function to modify a query to ensure proper punctuation and formatting.


def QueryModifier(Query) :
new_query = Query.lower().strip()
query_words = new_query.split()
question_words = ["how", "what", "who", "where", "when", "why", "which", "whose", "whom",
"can you", "what's", "where's", "how's", "can you"]
# Check if the query is a question and add a question mark if necessary.

#check this code and solve the issues

if any(word + " " in new_query for word in question_words):


if query_words[-1][-1] in ['.', '?', '!']:
new_query = new_query[:-1] + "?"
else:
new_query += "?"
else:
# Add a period if the query is not a question.
if query_words[-1][-1] in ['.', '?', '!'] :
new_query = new_query[:-1] + "."
else:
new_query += "."

return new_query.capitalize()

#check this code and solve the issues

# Function to translate text into English using the mtranslate library.


def UniversalTranslator(Text):
english_translation = mt.translate(Text, "en", "auto")
return english_translation.capitalize()

# Function to perform speech recognition using the WebDriver.


def SpeechRecognition():
# Open the HTML file in the browser.
driver.get("file:///" + Link)
# Start speech recognition by clicking the start button.
driver.find_element(by=By.ID, value="start").click()

while True:
try :
# Get the recognized text from the HTML output element.
Text = driver.find_element(by=By.ID, value="output").text

if Text:
# Stop recognition by clicking the stop button.
driver.find_element(by=By.ID, value="end").click()

# If the input language is English, return the modified query.


if InputLanguage.lower() == "en" or "en" in InputLanguage.lower():
return QueryModifier(Text)
else:
# If the input language is not English, translate the text and return it.
SetAssistantStatus("Translating...")
return QueryModifier(UniversalTranslator(Text))
except Exception as e :
pass

# Main execution block.


if __name__ == "__main__":
while True:
# Continuously perform speech recognition and print the recognized text.
Text = SpeechRecognition()
print(Text)

##################################################################################

from Frontend.GUI import (


GraphicalUserInterface,
SetAssistantStatus,
ShowTextToScreen,
TempDirectoryPath,
SetMicrophoneStatus,
AnswerModifier,
QueryModifier,
GetMicrophoneStatus,
GetAssistantStatus )
from Backend.Model import FirstLayerDMM
from Backend.RealtimeSearchEngine import RealtimeSearchEngine
from Backend.Automation import Automation
from Backend.SpeechToText import SpeechRecognition
from Backend.Chatbot import ChatBot
from Backend.TextToSpeech import TextToSpeech
from dotenv import dotenv_values
from asyncio import run
from time import sleep
import subprocess
import threading
import json
import os

env_vars = dotenv_values(".env")
Username = env_vars.get("Username")
Assistantname = env_vars.get("Assistantname")
DefaultMessage = f'''{Username} : Hello {Assistantname}, How are you?
{Assistantname} : Welcome {Username}. I am doing well. How may i help you?'''
subprocesses = []
Functions = ["open", "close", "play", "system", "content", "google search", "youtube search"]

def ShowDefaultChatIfNoChats():
File = open(r'Data\ChatLog.json', "r", encoding='utf-8')
if len(File.read())<5:
with open(TempDirectoryPath('Database.data'),'w', encoding='utf-8') as file:
file.write("")
with open(TempDirectoryPath('Responses.data'), 'w', encoding='utf-8') as file:
file.write(DefaultMessage)

def ReadChatLogJson():
with open(r'Data\ChatLog.json', 'r', encoding='utf-8' ) as file:
chatlog_data = json. load(file)
return chatlog_data

def ChatLogIntegration():
json_data = ReadChatLogJson()
formatted_chatlog = ""
for entry in json_data:
if entry[ "role"] == "user":
formatted_chatlog += f"User: {entry['content' ]}\n"
elif entry[ "role"] == "assistant":
formatted_chatlog += f"Assistant: {entry[ 'content' ]}\n"
formatted_chatlog = formatted_chatlog.replace("User", Username + " ")
formatted_chatlog = formatted_chatlog.replace("Assistant", Assistantname + " ")

with open(TempDirectoryPath('Database.data'), 'w', encoding='utf-8') as file:


file.write(AnswerModifier(formatted_chatlog))

def ShowChatsOnGUI():
File = open(TempDirectoryPath('Database.data'), "r", encoding='utf-8')
Data = File.read()
if len(str(Data))>0:
lines = Data.split('\n')
result = '\n'.join(lines)
File.close()
File = open(TempDirectoryPath('Responses.data'), "w", encoding='utf-8' )
File.write(result)
File.close()

def InitialExecution():
SetMicrophoneStatus("False")
ShowTextToScreen("")
ShowDefaultChatIfNoChats()
ChatLogIntegration()
ShowChatsOnGUI()

InitialExecution()

def MainExecution():

TaskExecution = False
ImageExecution = False
ImageGenerationQuery = ""

SetAssistantStatus("Listening ...")
Query = SpeechRecognition()
ShowTextToScreen(f"{Username}:{Query}")
SetAssistantStatus("Thinking ...")
Decision = FirstLayerDMM(Query)

print("")
print(f"Decision:{Decision}")
print("")

G = any([i for i in Decision if i.startswith("general")])


R = any([i for i in Decision if i.startswith("realtime")])

Mearged_query = " and ".join(


[" ". join(i.split()[1:]) for i in Decision if i.startswith("general") or i.startswith("realtime")]
)

for queries in Decision:


if "generate" in queries:
ImageGenerationQuery = str(queries)
ImageExecution = True

for queries in Decision:


if TaskExecution == False:
if any(queries.startswith(func) for func in Functions):
run(Automation(list(Decision)))
TaskExecution = True

if ImageExecution == True:

with open(r"Frontend\Files\ImageGeneration.data", "w") as file:


file.write(f"{ImageGenerationQuery},True")

try:
p1 = subprocess.Popen(['python', r'Backend\ImageGeneration.py'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
stdin=subprocess.PIPE, shell=False)
subprocesses.append(p1)

except Exception as e :
print(f"Error starting ImageGeneration.py: {e}")

if G and R or R:

SetAssistantStatus("Searching... ")
Answer = RealtimeSearchEngine(QueryModifier(Mearged_query))
ShowTextToScreen(f"{Assistantname}:{Answer}")
SetAssistantStatus("Answering... ")
TextToSpeech(Answer)
return True

else:
for Queries in Decision:
if "general" in Queries:
SetAssistantStatus("Thinking... ")
QueryFinal = Queries.replace("general", "")
Answer = ChatBot(QueryModifier(QueryFinal))
ShowTextToScreen(f"{Assistantname}:{Answer}")
SetAssistantStatus("Answering... ")
TextToSpeech(Answer)
return True

elif "realtime" in Queries :


SetAssistantStatus("Searching... ")
QueryFinal = Queries.replace( "realtime ", "")
Answer = RealtimeSearchEngine(QueryModifier(QueryFinal))
ShowTextToScreen(f"{Assistantname}:{Answer}")
SetAssistantStatus("Answering... ")
TextToSpeech(Answer)
return True

elif "exit" in Queries:


QueryFinal = "Okay, Bye!"
Answer = ChatBot(QueryModifier(QueryFinal))
ShowTextToScreen(f"{Assistantname}:{Answer}")
SetAssistantStatus("Answering...")
TextToSpeech(Answer)
SetAssistantStatus("Answering... ")
os._exit(1)

def FirstThread():

while True:

CurrentStatus = GetMicrophoneStatus()

if CurrentStatus == "True":
MainExecution()

else:
AIStatus = GetAssistantStatus()

if "Available... " in AIStatus:


sleep(0.1)

else:
SetAssistantStatus("Available... ")

def SecondThread():

GraphicalUserInterface()

if __name__ == "__main__":
thread2 = threading.Thread(target=FirstThread, daemon=True)
thread2.start()
SecondThread()

################################################################################

from PyQt5.QtWidgets import QApplication, QMainWindow, QTextEdit, QStackedWidget,


QWidget, QLineEdit, QGridLayout, QVBoxLayout, QHBoxLayout, QPushButton, QFrame, QLabel,
QSizePolicy
from PyQt5.QtGui import QIcon, QPainter, QMovie, QColor, QTextCharFormat, QFont, QPixmap,
QTextBlockFormat
from PyQt5. QtCore import Qt, QSize, QTimer
from dotenv import dotenv_values
import sys
import os

env_vars = dotenv_values (".env")


Assistantname = env_vars.get("Assistantname")
current_dir = os.getcwd()
old_chat_message = ""
TempDirPath = rf"{current_dir}\Frontend\Files"
GraphicsDirPath = rf"{current_dir}\Frontend\Graphics"

def AnswerModifier(Answer):
lines = Answer.split('\n')
non_empty_lines = [line for line in lines if line.strip()]
modified_answer = '\n'.join(non_empty_lines)
return modified_answer

def QueryModifier(Query):

new_query = Query.lower().strip()
query_words = new_query.split()
question_words = ["how", "what", "who", "where", "when", "why", "which", "whose", "whom",
"can you", "what's", "where's", "how's"]

# check the below block of code

if any(word + " " in new_query for word in question_words):


if query_words[-1][-1] in ['.', '?', '!']:
new_query = new_query[:-1] + "?"
else:
new_query += "?"

else:
if query_words[-1][-1] in ['.', '?', '!']:
new_query = new_query [:-1] + "."
else:
new_query += "."

return new_query.capitalize()

# check the below block of code

def SetMicrophoneStatus(Command):
with open(rf'{TempDirPath}\Mic.data', "w", encoding='utf-8' ) as file:
file.write(Command)

def GetMicrophoneStatus():
with open(rf'{TempDirPath}\Mic.data', "r", encoding='utf-8' ) as file:
Status = file.read()
return Status

def SetAssistantStatus(Status):
with open(rf'{TempDirPath}\Status.data', "w", encoding='utf-8') as file:
file.write(Status)

def GetAssistantStatus():
with open(rf'{TempDirPath}\Status.data', "r", encoding='utf-8') as file:
Status = file.read()
return Status

def MicButtonInitialed():
SetMicrophoneStatus ("False")

def MicButtonClosed():
SetMicrophoneStatus ("True")
def GraphicsDirectoryPath(Filename):
Path = rf'{GraphicsDirPath}\{Filename}'
return Path

def TempDirectoryPath(Filename):
Path = rf'{TempDirPath}\{Filename}'
return Path

def ShowTextToScreen(Text):
with open(rf'{TempDirPath}\Responses.data', "w", encoding='utf-8') as file:
file.write(Text)

class ChatSection(QWidget):

def __init__(self):
super(ChatSection, self).__init__()
layout = QVBoxLayout(self)
layout.setContentsMargins(-10, 40, 40, 100)
layout.setContentsMargins (-10, 40, 40, 100)
layout.setSpacing(-100)
self.chat_text_edit = QTextEdit()
self.chat_text_edit.setReadOnly(True)
self. chat_text_edit.setTextInteractionFlags(Qt.NoTextInteraction) # No text interaction
self.chat_text_edit.setFrameStyle(QFrame.NoFrame)
layout.addWidget(self.chat_text_edit)
self.setStyleSheet("background-color:black;")
layout.setSizeConstraint(QVBoxLayout. SetDefaultConstraint)
layout.setStretch(1, 1)
self.setSizePolicy(QSizePolicy(QSizePolicy.Expanding, QSizePolicy.Expanding))
text_color = QColor(Qt.blue)
text_color_text = QTextCharFormat()
text_color_text.setForeground(text_color)
self. chat_text_edit.setCurrentCharFormat(text_color_text)
self.gif_label = QLabel()
self.gif_label.setStyleSheet("border:none;")
movie = QMovie(GraphicsDirectoryPath('Jarvis.gif' ))
max_gif_size_W = 480
max_gif_size_H = 270
movie.setScaledSize(QSize(max_gif_size_W, max_gif_size_H))
self.gif_label.setAlignment (Qt.AlignRight | Qt.AlignBottom)
self.gif_label. setMovie(movie)
movie.start()
layout.addWidget(self.gif_label)
self.label = QLabel("")
self.label.setStyleSheet("color: white; font-size: 16px; margin-right: 195px; border: none;
margin-top:-30px;")
self.label.setAlignment(Qt.AlignRight)
layout.addWidget(self.label)
layout.setSpacing(-10)
layout.addWidget(self.gif_label)
font = QFont()
font.setPointSize(13)
self.chat_text_edit.setFont(font)
self.timer = QTimer(self)
self.timer.timeout.connect(self.loadMessages)
self.timer.timeout.connect(self.SpeechRecogText)
self.timer.start(5)
self.chat_text_edit.viewport().installEventFilter(self)
self.setStyleSheet("""
QScrollBar:vertical {
border:none;
background:black;
width:10px;
margin:0px 0px 0px 0px;
}

QScrollBar::handle:vertical {
background:white;
min-height:20px;
}

QScrollBar::add-line:vertical {
background:black;
subcontrol-position:bottom;
subcontrol-origin:margin;
height:10px;
}

QScrollBar::sub-line:vertical {
background:black;
subcontrol-position:top;
subcontrol-origin:margin;
height:10px;
}

QScrollBar::up-arrow:vertical, QScrollBar::down-arrow:vertical {
border:none;
background:none;
color:none;
}

QScrollBar::add-page:vertical, QScrollBar::sub-page:vertical {
background:none;
}
""")

def loadMessages(self):
global old_chat_message

with open(TempDirectoryPath('Responses.data'), "r", encoding='utf-8') as file:


messages = file.read()

if None==messages:
pass
elif len(messages) <= 1:
pass
elif str(old_chat_message)==str(messages):
pass
else:
self.addMessage(message=messages, color='White')
old_chat_message = messages

def SpeechRecogText(self):
with open(TempDirectoryPath('Status.data'), "r", encoding='utf-8') as file:
messages = file.read()
self.label.setText(messages)

def load_icon(self, path, width=60, height=60):


pixmap = QPixmap(path)
new_pixmap = pixmap.scaled(width, height)
self.icon_label.setPixmap(new_pixmap)

def toggle_icon(self, event=None):

if self.toggled:
self.load_icon(GraphicsDirectoryPath('voice.png'), 60, 60)
MicButtonInitialed()

else:
self.load_icon(GraphicsDirectoryPath('mic.png'), 60, 60)
MicButtonClosed()

self.toggled = not self.toggled

def addMessage(self, message, color):


cursor = self.chat_text_edit.textCursor()
format = QTextCharFormat()
formatm = QTextBlockFormat()
formatm.setTopMargin(10)
formatm.setLeftMargin(10)
format.setForeground(QColor(color))
cursor.setCharFormat (format)
cursor.setBlockFormat(formatm)
cursor.insertText(message + "\n")
self.chat_text_edit.setTextCursor(cursor)

class InitialScreen(QWidget):

def __init__(self, parent=None):


super().__init__(parent)
desktop = QApplication.desktop()
screen_width = desktop. screenGeometry().width()
screen_height = desktop.screenGeometry().height()
content_layout = QVBoxLayout ()
content_layout.setContentsMargins(0, 0, 0, 0)
gif_label = QLabel()
movie = QMovie(GraphicsDirectoryPath('Jarvis.gif'))
gif_label.setMovie(movie)
max_gif_size_H = int(screen_width / 16 * 9)
movie.setScaledSize(QSize(screen_width, max_gif_size_H))
gif_label.setAlignment(Qt.AlignCenter)
movie.start()
gif_label.setSizePolicy(QSizePolicy.Expanding, QSizePolicy.Expanding)
self.icon_label = QLabel()
pixmap = QPixmap(GraphicsDirectoryPath('Mic_on.png'))
new_pixmap = pixmap.scaled(60, 60)
self.icon_label.setPixmap(new_pixmap)
self.icon_label.setFixedSize(150,150)
self.icon_label.setAlignment(Qt.AlignCenter)
self.toggled = True
self.toggle_icon()
self.icon_label.mousePressEvent = self.toggle_icon
self.label = QLabel("")
self.label.setStyleSheet("color: white; font-size: 16px; margin-bottom:0;")
content_layout.addWidget(gif_label, alignment=Qt.AlignCenter)
content_layout.addWidget(self.label, alignment=Qt.AlignCenter)
content_layout.addWidget(self.icon_label, alignment=Qt.AlignCenter)
content_layout.setContentsMargins(0, 0, 0, 150)
self.setLayout(content_layout)
self.setFixedHeight(screen_height)
self.setFixedWidth(screen_width)
self.setStyleSheet("background-color: black; ")
self.timer = QTimer(self)
self.timer.timeout.connect(self.SpeechRecogText)
self.timer.start(5)

def SpeechRecogText(self):
with open(TempDirectoryPath('Status.data'), "r", encoding='utf-8') as file:
messages = file.read()
self.label.setText(messages)

def load_icon(self, path, width=60, height=60):


pixmap = QPixmap(path)
new_pixmap = pixmap.scaled(width, height)
self.icon_label.setPixmap(new_pixmap)

def toggle_icon(self, event=None):

if self.toggled:
self.load_icon(GraphicsDirectoryPath('Mic_on.png'), 60, 60)
MicButtonInitialed()

else:
self.load_icon(GraphicsDirectoryPath('Mic_off.png'), 60, 60)
MicButtonClosed()

self.toggled = not self.toggled

class MessageScreen(QWidget):
def __init__(self, parent=None):
super().__init__(parent)
desktop = QApplication.desktop()
screen_width = desktop.screenGeometry().width()
screen_height = desktop.screenGeometry().height()
layout = QVBoxLayout()
label = QLabel("")
layout.addWidget(label)
chat_section = ChatSection()
layout.addWidget(chat_section)
self.setLayout(layout)
self.setStyleSheet("background-color: black;")
self.setFixedHeight(screen_height)
self.setFixedWidth(screen_width)

class CustomTopBar(QWidget):
def __init__(self, parent, stacked_widget):
super().__init__(parent)
self.initUI()
self.current_screen = None
self.stacked_widget = stacked_widget

def initUI(self):
self. setFixedHeight(50)
layout = QHBoxLayout (self)
layout.setAlignment(Qt.AlignRight)
home_button = QPushButton()
home_icon = QIcon(GraphicsDirectoryPath("Home.png"))
home_button.setIcon(home_icon)
home_button.setText ("Home")
home_button.setStyleSheet( "height:40px; line-height:40px; background-color:white;
color:black")
message_button = QPushButton()
message_icon = QIcon(GraphicsDirectoryPath("Chats.png"))
message_button.setIcon(message_icon)
message_button.setText("Chat")
message_button.setStyleSheet ("height:40px; line-height:40px; background-color:white;
color:black")
minimize_button = QPushButton()
minimize_icon = QIcon(GraphicsDirectoryPath('Minimize2.png'))
minimize_button.setIcon(minimize_icon)
minimize_button.setStyleSheet("background-color:white")
minimize_button.clicked.connect(self.minimizeWindow)
self.maximize_button = QPushButton()
self.maximize_icon = QIcon(GraphicsDirectoryPath('Maximize.png'))
self.restore_icon = QIcon(GraphicsDirectoryPath('Minimize.png'))
self.maximize_button.setIcon(self.maximize_icon)
self.maximize_button.setFlat(True)
self.maximize_button.setStyleSheet("background-color:white")
self.maximize_button.clicked.connect(self.maximizeWindow)
close_button = QPushButton()
close_icon = QIcon(GraphicsDirectoryPath('Close.png'))
close_button.setIcon(close_icon)
close_button.setStyleSheet("background-color:white")
close_button.clicked.connect(self.closeWindow)
line_frame = QFrame()
line_frame.setFixedHeight(1)
line_frame.setFrameShape(QFrame.HLine)
line_frame.setFrameShadow(QFrame.Sunken)
line_frame.setStyleSheet("border-color: black;")
title_label = QLabel(f"{str(Assistantname).capitalize()} ") # Advanced Virtual Assistant
Name
title_label.setStyleSheet ("color:black; font-size:18px; background-color:white")
home_button.clicked.connect(lambda:self.stacked_widget.setCurrentIndex(0))
message_button.clicked.connect(lambda:self.stacked_widget.setCurrentIndex(1))
layout.addWidget(title_label)
layout.addStretch(1)
layout.addWidget(home_button)
layout.addWidget(message_button)
layout.addStretch(1)
layout.addWidget(minimize_button)
layout.addWidget(self.maximize_button)
layout.addWidget(close_button)
layout.addWidget(line_frame)
self.draggable = True
self.offset = None

def paintEvent(self, event):


painter = QPainter(self)
painter.fillRect(self.rect(), Qt.white)
super().paintEvent(event)

def minimizeWindow(self):
self.parent().showMinimized()

def maximizeWindow(self):
if self.parent().isMaximized():
self.parent().showNormal()
self.maximize_button.setIcon(self.maximize_icon)
else :
self.parent().showMaximized()
self.maximize_button.setIcon(self.restore_icon)

def closeWindow(self):
self.parent().close()

def mousePressEvent(self, event):


if self.draggable:
self.offset = event.pos()

def mouseMoveEvent(self, event):


if self.draggable and self.offset:
new_pos = event.globalPos() - self.offset
self.parent().move(new_pos)

def showMessageScreen(self):
if self.current_screen is not None:
self.current_screen.hide()

message_screen = MessageScreen(self)
layout = self.parent().layout()
if layout is not None:
layout.addWidget(message_screen)
self.current_screen = message_screen

class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowFlags(Qt. FramelessWindowHint)
self.initUI()

def initUI(self):
desktop = QApplication.desktop()
screen_width = desktop.screenGeometry().width()
screen_height = desktop. screenGeometry().height()
stacked_widget = QStackedWidget(self)
initial_screen = InitialScreen()
message_screen = MessageScreen( )
stacked_widget.addWidget(initial_screen)
stacked_widget.addWidget(message_screen)
self.setGeometry(0, 0, screen_width, screen_height)
self.setStyleSheet("background-color: black;")
top_bar = CustomTopBar(self, stacked_widget)
self.setMenuWidget(top_bar)
self.setCentralWidget(stacked_widget)

def GraphicalUserInterface():
app = QApplication(sys.argv)
window = MainWindow()
window.show()
sys.exit(app.exec_())

if __name__ == "__main__":
GraphicalUserInterface()

Chapter 9

OUTPUT
Chapter 10

SYSTEM OVERVIEW
10.1 Advantages
1 Instant Access to Information: Users can quickly get answers to questions, check weather
forecasts, news updates, traffic conditions, and more, simply by asking. It's like having an
expert available instantly.

2 Personalization: Voice assistants can learn and adapt to individual user preferences, habits,
and past interactions over time. This allows them to provide personalized recommendations,
tailored responses, and proactive assistance.

3 Natural Interaction: Interacting with a voice assistant through spoken commands feels
more natural and intuitive than traditional input methods. This makes technology more user-
friendly for a wider demographic, including children and the elderly.

4 Multi-Platform Integration: Voice assistants seamlessly integrate with various platforms


and devices, including smartphones, smart speakers, smart TVs, and other IoT devices,
creating a connected ecosystem.

5 Multilingual Support: Many voice assistants support multiple languages and dialects,
breaking down language barriers and catering to a global audience.

10.2 Disadvantages of existing system

1. Performance: Voice assistants may have limitations in terms of their performance, such as
the speed at which they can process and respond to user requests, or the complexity of tasks
that they can handle.

2. Privacy and security: Voice assistants may not always clearly communicate their data collection
and sharing practices to users, which could raise concerns about transparency and
consent.

3. Customization: Voice assistants may not offer users a high degree of customization or
control over their functionality, which could limit their usefulness and appeal to users.

4. Accuracy: Voice assistants may not always accurately understand or respond to user
requests and queries, which can lead to frustration and a poor user experience.

5. Capabilities: Voice assistants may not support all tasks or functions that users may want to
perform, and they may not be able to integrate with all devices or systems.

10.3 Features
• It can get some real time information such as news headlines, weather report, IP address,
Internet speed, and system stats.
• It can also get entertaining contents such as jokes, latest movies or TV series, and playing
songs and videos in YouTube.

• It can also generate an image from given text and can also send an email. It can perform
system operations such as opening/closing/switching tabs, copying/pasting/deleting/selecting
the text, creating a new file, taking screen shot, minimizing/maximizing/switching/closing
windows.

• It can also get brief information on any topic, perform arithmetic operations, and answer
any general knowledge question.

• It can perform google search, find map or distance between two places on google maps.

• We can also get the chat history along with date & time of the query.

• It can also open any.

10.4 Future Scope:


We are entering the era of implementing voice-activated technologies to remain relevant and
competitive. Voice-activation technology is vital not only for businesses to stay relevant with
their target customers, but also for internal operations. Technology may be utilized to
automate
human operations, saving time for everyone. Routine operations, such as sending basic
emails
or scheduling appointments, can be completed more quickly, with less effort, and without the
use of a computer, just by employing a simple voice command. People can multitask as a
result,
enhancing their productivity. Furthermore, relieving employees from hours of tedious
administrative tasks allows them to devote more time to strategy meetings, brainstorming
sessions, and other jobs that need creativity and human interaction.

1) Sending Emails with a voice assistant:


Emails, as we all know, are very crucial for communication because they can be used for any
professional contact, and the finest service for sending and receiving emails is, as we all
know,
GMAIL. Gmail is a Google-created free email service. Gmail can be accessed over the web
or
using third-party apps that use the POP or IMAP protocols to synchronize email content.
To integrate Gmail with Voice Assistant we have to utilize Gmail API. The Gmail API
allows
you to access and control threads, messages, and labels in your Gmail mailbox.

2) Scheduling appointments using a voice assistant:


The demands on our time increase as our company grows. A growing number of people want
to meet with us. We have a growing number of people who rely on us. We must check in on
certain projects or set aside time to chat with possible business leads. There won't be enough
hours in the day if we keep doing things the old way. We need to get a better handle on our
full-time schedule and devise a strategy for arranging appointments that doesn't interfere with
our most critical job. By working with a virtual scheduler or, in other words, a virtual
assistant, we let someone else worry about the organization and prioritize our schedule while
we focus on the work.
Chapter 11

CONCLUSION

In conclusion, the voice assistant developed in this project is capable of performing various
tasks such as browsing the internet, sending emails, generating images, and interacting with
the user through conversation. It is able to do so by utilizing various APIs and technologies
such as stability’s, Google Speech Recognition, and SMTP.
The voice assistant is also able to perform system tasks such as opening and closing tabs,
windows, and applications, as well as taking screenshots and manipulating text in the
clipboard.
Chapter 12

Biblograpy

 Amrita S. Tulshan, Sudhir Namdeorao Dhage, "Survey on Virtual Assistant: Google


Assistant, Siri, Alexa," International Research Journal of Engineering and Technology
(IRJET), 2019.

 Manjusha Jadhav, Krushna Kalyankar, Gnaesh Narkhede, Swapnil Kharose, "Smart


Virtual Voice Assistant," International Journal of Research in Engineering, Science and
Management (IJRESM), 2020.

 S. Lahari, A. Naveen, G. Sarath Chandra, "Survey on Personal Voice Assistant,"


International Journal of Advanced Research in Computer and Communication Engineering
(IJARCCE), 2021.

 Prof. Suresh V. Reddy, Chandresh Chhari, Prajwal Wakde, Nikhil Kamble, "Personal
Desktop Virtual Voice Assistant using Python," IJCRT, 2022.

 Python Software Foundation, “Python Language Reference Manual”, [Online]. Available:


https://www.python.org

 Pyttsx3 Documentation, [Online]. Available: https://pypi.org/project/pyttsx3/

 SpeechRecognition Library Documentation, [Online]. Available:


https://pypi.org/project/SpeechRecognition/

 Wikipedia API for Python, [Online]. Available: https://pypi.org/project/wikipedia/

 Webbrowser Module – Python Docs, [Online]. Available:


https://docs.python.org/3/library/webbrowser.html

 Requests Library – Python HTTP for Humans, [Online]. Available: https://docs.python-


requests.org/

 PyWhatKit Documentation, [Online]. Available: https://pypi.org/project/pywhatkit/

 PyAutoGUI Documentation, [Online]. Available: https://pyautogui.readthedocs.io/

 OpenWeatherMap API, [Online]. Available: https://openweathermap.org/api

 TMDB API (The Movie Database), [Online]. Available:


https://developer.themoviedb.org/docs

 Selenium with Python, [Online]. Available: https://selenium-python.readthedocs.io/

 Groq API – Large Language Models, [Online]. Available: https://groq.com/


Chapter 13

OTHER DOCUMENTATION
13.1 Based paper
13.2 Published paper
13.3 Plagiarism

You might also like