Mini Project Report
on
     AI base Desktop Assistant using Python
                      by
  Shubham Shidheshwar Aute (2030331246044)
     Rohit Dinkar Gaikwad (2030331246036)
      Suyog Yogesh Deore (2030331246047)
      Darvesh Singh Dogra (2030331246060)
     Department of Information Technology
Dr. Babasaheb Ambedkar Technological University,
   Lonere-402103, Dist. Raigad, (MS) INDIA.
                             Certificate
Mini Project report,on AI base Desktop Assistant using Python
submitted by Shubham Aute, Rohit Gaikwad, Suyog Deore, Darvesh
Dogra , is approved for the partial fulfillment of the requirements for
the degree of B.Tech.in Information Techonolgy of Dr. Babasaheb
Ambedkar Technological University, Lonere - 402 103, Raigad (MS).
   Examiner(s)
(1) —————————————————————— Sign.: ——————–
(2) —————————————————————— Sign.: ——————–
   Ms. Sonali V. Bharad                              Dr. Sanjay R. Sutar
           Guide                                     Head of Department
Place: Dr. Babasaheb Ambedkar Technological University, Lonere - 402 103.
                                         1
                   Acknowledgments
Our first and foremost words of recognition go to my highly esteemed
Guide for her constructive academic advice and guidance, constant en-
couragement and valuable suggestions, and all other support and kindness
to me. Her supervision and guidance proved to be the most valuable to
overcome all the hurdles in the completion of this report.
  We are also thankful to the Head of the Department, Dr. Sanjay R.
Sutar, for his guidance and valuable suggestions. We would also like to
thank my departmental staff, and library staff for their timely help.
  Finally, We would like to thank all whose direct and indirect support
helped me complete this report in time.
Shubham Shidheshwar Aute (2030331246044)
Rohit Dinkar Gaikwad (2030331246036)
Suyog Yogesh Deore (2030331246047)
Darvesh Singh Dogra (2030331246060)
                                    2
                             Abstract
As we all know now we are living in the era of computers. We all would
have wondered how convenient it would be if we had our own virtual A.I.
assistant , imagine how easier and effortless it would be to send emails
without typing a single word, searching on Wikipedia without actually
opening the web browsers, and performing many other daily tasks with
the help of a single voice command. As of today, voice assistants are now
everywhere. Voice based artificial intelligence is here to play an important
role in our daily life. Several such applications include Siri on Apple
devices, Cortana on Microsoft Devices, and Google Assistant on Android
devices. There are also devices dedicated to providing virtual assistance.
  Virtual assistants are typically cloud-based programs that require internet-
connected devices and/or applications to work. Virtual assistants typically
perform simple jobs for end users, such as adding tasks to a calendar, pro-
viding information that would normally be searched in a web browser, or
a little more complex tasks like checking the status of smart home devices
etc. The work is initialized by analyzing the audio commands given by the
user via microphone. The speech engine is setup up so that it can convert
the text to speech using in build libraries. Speech recognition is used to
convert the speech input to text. This text is then fed to the model which
determines the nature of the command and calls the relatable script for
execution. This is basically what happens when the assistant receives a
command from the user.
                                     3
Contents
1 Introduction                                                                1
    1.1   Background . . . . . . . . . . . . . . . . . . . . . . . . . .       2
    1.2   Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . .     3
    1.3   Purpose, Scope and Applicability . . . . . . . . . . . . . .         4
    1.4   How it Works . . . . . . . . . . . . . . . . . . . . . . . . .       6
2   Introduction to AI                                                         7
    2.1   What is Artificial Intelligence? . . . . . . . . . . . . . . . .     7
    2.2   Types of Artificial Intelligence . . . . . . . . . . . . . . . .     8
3 Software Requirements Specification                                          9
    3.1   Hardware Requirement:       . . . . . . . . . . . . . . . . . . .    9
    3.2   Software Requirement: . . . . . . . . . . . . . . . . . . . .        9
4 Implementation                                                              13
5 System Design                                                               16
    5.1   ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . .      16
    5.2   Activity Diagram . . . . . . . . . . . . . . . . . . . . . . .      18
                                       4
  5.3   Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . .    19
  5.4   Use Case Diagram . . . . . . . . . . . . . . . . . . . . . .       20
  5.5   Sequence Diagram . . . . . . . . . . . . . . . . . . . . . .       21
6 Result                                                                   22
  6.1   User Interface . . . . . . . . . . . . . . . . . . . . . . . . .   22
  6.2   Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   23
7 BIBLIOGRAPHY                                                             24
                                     5
Chapter 1
Introduction
In today’s era almost all tasks are digitalized. We have Smartphone in
hands and it is nothing less than having world at your finger tips. These
days we aren’t even using fingers. We just speak of the task and it is done.
There exist systems where we can say Text Dad, “I’ll be late today.” And
the text is sent. That is the task of a Virtual Assistant.
   Virtual Assistants are software programs that help you ease your day
to day tasks, such as showing weather report, creating reminders, making
shopping lists etc. They can take commands via text (online chat bots)
or by voice. Voice based intelligent assistants need an invoking word or
wake word to activate the listener, followed by the command.
   This system is designed to be used efficiently on desktops. Personal as-
sistant software improves user productivity by managing routine tasks of
the user and by providing information from online sources to the user.”Helper”
is effortless to use. Allow your intelligent assistant to make email work for
you. Detect intent, pick out important information, automate processes,
and deliver personalized responses.
                                      1
   This project was started on the premise that there is sufficient amount
of openly available data and information on the web that can be utilized
to build a virtual assistant that has access to making intelligent decisions
for routine user activities.
1.1    Background
There already exist a number of desktop virtual assistants. A few exam-
ples of current virtual assistants available in market are discussed in this
section along with the tasks they can provide and their drawbacks.
SIRI from Apple
   SIRI is personal assistant software that interfaces with the user thru
voice interface, recognizes commands and acts on them. It learns to adapt
to user’s speech and thus improves voice recognition over time. It also
tries to converse with the user when it does not identify the user request.
It integrates with calendar, contacts and music library applications on the
device and also integrates with GPS and camera on the device. It uses lo-
cation, temporal, social and task based contexts, to personalize the agent
behavior specifically to the user at a given point of time.
Supported Tasks
  • Call someone from my contacts list
                                     2
  • Launch an application on my iPhone
  • Send a text message to someone
  • Set up a meeting on my calendar for 9am tomorrow
  • Set an alarm for 5am tomorrow morning
  • Play a specific song in my iTunes library
  • Enter a new note
  Drawback
  SIRI does not maintain a knowledge database of its own and its un-
derstanding comes from the information captured in domain models and
data models.
1.2      Objectives
Main objective of building personal assistant software (a virtual assistant)
is using semantic data sources available on the web, user generated content
and providing knowledge from knowledge databases. The main purpose of
an intelligent virtual assistant is to answer questions that users may have.
This may be done in a business environment, for example, on the business
website, with a chat interface. On the mobile platform, the intelligent
virtual assistant is available as a call-button operated service where a
voice asks the user “What can I do for you?” and then responds to verbal
input.
                                     3
   One of the main advantages of voice searches is their rapidity. In fact,
voice is reputed to be four times faster than a written search: whereas we
can write about 40 words per minute, we are capable of speaking around
150 during the same period of time 15. In this respect, the ability of
personal assistants to accurately recognize spoken words is a prerequisite
for them to be adopted by consumers.
1.3     Purpose, Scope and Applicability
Purpose
   Purpose of virtual assistant is to being capable of voice interaction,
music playback, making to-do lists, setting alarms, streaming podcasts,
playing audiobooks, and providing weather, traffic, sports, and other real-
time information, such as news. Virtual assistants enable users to speak
natural language voice commands in order to operate the device and its
apps.
   There is an increased overall awareness and a higher level of comfort
demonstrated specifically by millennial consumers. In this ever-evolving
digital world where speed, efficiency, and convenience are constantly be-
ing optimized, it’s clear that we are moving towards less screen interaction.
Scope
   Voice assistants will continue to offer more individualized experiences
as they get better at differentiating between voices. However, it’s not just
                                     4
developers that need to address the complexity of developing for voice as
brands also need to understand the capabilities of each device and inte-
gration and if it makes sense for their specific brand. They will also need
to focus on maintaining a user experience that is consistent within the
coming years as complexity becomes more of a concern. This is because
the visual interface with voice assistants is missing. Users simply cannot
see or touch a voice interface.
Applicability
   The mass adoption of artificial intelligence in users’ everyday lives is
also fueling the shift towards voice. The number of IoT devices such as
smart thermostats and speakers are giving voice assistants more utility in
a connected user’s life. Smart speakers are the number one way we are
seeing voice being used. Many industry experts even predict that nearly
every application will integrate voice technology in some way in the next
5 years.
   The use of virtual assistants can also enhance the system of IoT (In-
ternet of Things). Twenty years from now, Microsoft and its competitors
will be offering personal digital assistants that will offer the services of a
full-time employee usually reserved for the rich and famous.
                                      5
1.4     How it Works
  • User ask the Assistant to perform the task.
  • The natural language audio signal is converted into digital data that
      can be analyzed by the software.
  • Compared with a database of the software using an innovative algo-
      rithm to find a suitable answer.
  • This database is located on distributed servers in cloud networks
  • For this reason, it must have a reliable Internet connection.
                                     6
Chapter 2
Introduction to AI
2.1     What is Artificial Intelligence?
In the simplest terms, AI which stands for artificial intelligence refers to
systems or machines that mimic human intelligence to perform tasks and
can iteratively improve themselves based on the information they collect.
AI manifests in a number of forms.A few examples are:
  • Chatbots use AI to understand customer problems faster and provide
      more efficient answers
  • Intelligent assistants use AI to parse critical information from large
      free-text datasets to improve scheduling
  • Recommendation engines can provide automated recommendations
      for TV shows based on users’ viewing habits
                                     7
2.2   Types of Artificial Intelligence
Based on Capabilities
 • Weak AI or Narrow AI
 • General AI
 • Super AI
Based on Functionality
 • Reactive Machines
 • Limited Memory
 • Theory of Mind
                            8
Chapter 3
Software Requirements Specification
The system is built keeping in mind the generally available hardware and
software compatibility. It doesn9t require any expensive hardware devices.
The minimum hardware and software requirements for the system are
listed below.
3.1     Hardware Requirement:
  • Processor - minimum Intel(R) Core(TM) i5-11500U CPU @ 2.20GHz
      2.60 GHz
  • Processor speed - minimum 2 GHz
  • RAM - minimum 8.00 GB
3.2     Software Requirement:
  • pyttsx3 - is a text-to-speech conversion library in Python. Unlike
      alternative libraries, it works offline, and is compatible with both
                                     9
  Python 2 and 3. . An application invokes the pyttsx3.init() factory
  function to get a reference to a pyttsx3. Engine instance. It is a
  very easy-to-use tool that converts the entered text into speech. The
  pyttsx3 module supports two voices first is female and the second is
  male which is provided by ¡sapi5= for windows. It supports three TTS
  engines:- sapi5 – SAPI5 on Windows, nsss – NSSpeechSynthesizer on
  Mac OS X, espeak – eSpeak on every other platform.
• Speech Recognition - to allow us to convert audio into text for further
  processing. The speech recognition module is used to covert the audio
  into text. We are also using the Google web speech API and turn the
  spoken language into text using recognize google().
• Visual Studio Code - Visual Studio Code is a streamlined code ed-
  itor with support for development operations like debugging, task
  running, and version control. It aims to provide just the tools a
  developer needs for a quick code-build-debug cycle and leaves more
  complex workflows to fuller featured IDEs, such as Visual Studio IDE.
  It is a source code editor made by Microsoft for Windows, Linux and
  macOS. Features include support for debugging, syntax highlighting,
  intelligent code completion, snippets, code refactoring, and embedded
  Git. Users can change the theme, keyboard shortcuts, preferences,
  and install extensions that add additional functionality.
• OS Module - The OS module in Python provides functions for inter-
  acting with the operating system. OS comes under Python’s standard
  utility modules. This module provides a portable way of using op-
                                  10
  erating system-dependent functionality. The os and os.path modules
  include many functions to interact with the file system.
• Wikipedia - Wikipedia is a multilingual online encyclopedia created
  and maintained as an open collaboration project by a community of
  volunteer editors using a wiki-based editing system. In this article,
  we will see how to use Python’s Wikipedia module to fetch a variety
  of information from the Wikipedia website.
• Webbrowser Module - The webbrowser module provides a basic in-
  terface to the system’s standard web browser. It provides an open
  function, which takes a filename or a URL, and displays it in the
  browser. If you call open again, it attempts to display the new page
  in the same browser window.
• smtplib Module - The smtplib module defines an SMTP client session
  object that can be used to send mail to any internet machine with
  an SMTP or ESMTP listener daemon. For details of SMTP and
  ESMTP operation, consult RFC 821 (Simple Mail Transfer Protocol)
  and RFC 1869 (SMTP Service Extensions).
• Wolframalpha - The Wolfram—Alpha Webservice API provides a
  web-based API allowing the computational and presentation capa-
  bilities of Wolfram—Alpha to be integrated into web, mobile, desk-
  top, and enterprise applications. Wolfram Alpha is an API which can
  compute expert-level answers using Wolfram’s algorithms, knowledge
  base and AI technology. It is made possible by the Wolfram Lan-
                                 11
  guage.
• Beautiful Soup - Beautiful Soup is a Python library for pulling data
  out of HTML and XML files. It works with your favorite parser to
  provide idiomatic ways of navigating, searching, and modifying the
  parse tree. It commonly saves programmers hours or days of work.
• Datetime - In Python, date and time are not a data type of their own,
  but a module named datetime can be imported to work with the date
  as well as time. Python Datetime module comes built into Python,
  so there is no need to install it externally.Python Datetime module
  supplies classes to work with date and time. These classes provide a
  number of functions to deal with dates, times and time intervals. Date
  and datetime are an object in Python, so when you manipulate them,
  you are actually manipulating objects and not string or timestamps
                                 12
Chapter 4
Implementation
Step 1: Setting up the speech engine
  The pyttsx3 module is stored in a variable name engine. pyttsx3 is a
text-to-speech conversion library in Python. Unlike alternative libraries,
it works offline and is compatible with both Python 2 and 3. It is a very
easy-to-use tool that converts the entered text into speech. The pyttsx3
module supports two voices first is female and the second is male.
Step 2: Speech Recognition
  Speech recognition is a machine’s ability to listen to spoken words and
identify them. Speech recognition allows us to convert audio into text for
further processing. The speech recognition module is used to convert the
audio into text. It can be used to convert the spoken words into text,
make a query or give reply. We are creating an instance of the Recognizer
class and we will use recognize google() method on it to access the Google
web speech API and turn spoken language into text.
                                    13
Step 3: Neural network for assistant
  Neural networks comprise of layers/modules that perform operations on
data. The torch.nn namespace provides all the building blocks you need to
build your own neural network and define our neural network. Then create
the neurons through which data and computations flow. The input comes
from the raw data set We use NumPy to build a single neuron. NLTK is a
toolkit built for working with NLP in Python. It provides us with various
text processing libraries with a lot of test datasets. A neural network is a
series of algorithms that endeavours to recognize underlying relationships
in a set of data through a process that mimics the way the human brain
operates. They develop the output without programmed rules.
Step 4: dataset
  Here we are creating a .json file which contains tags, patterns, responses
and they are supplied to the neural network to train the model. And then
all the data which is trained will be stored in a .pth dataset file, .pth is
a data file for machine learning in PyTorch. The reason why we used a
json file is because it is a data inter change format and uses human read-
able text to store and transmit data objects consisting of attribute value
pairs and arrays. It basically has two data structures: - object and array.
Object stores a set of name value pairs and array is a list of values. The
dataset has been created by us depending on the tasks that has to be
carried out.
                                     14
Step 5: Categorizing
   We are generating the probability where we train the model what to re-
spond like if we are communicating with the assistant the assistant should
be able to categorize the conversation under the specific tag using the
probability.
Step 6: Tasks
   There are two types of inputs :– input function and non input function.
Examples of non input function are Time, Date etc. and examples of input
functions – google search, Wikipedia etc. Such tasks can be implemented
using various modules provided by python like datetime, Wikipedia, py-
whatkit etc. We can also provide tasks for interacting with operating
system using OS library. And based on the tasks we add, we even need
to add various tags related to the task which can make the conversation
better after training the model.
                                    15
Chapter 5
System Design
5.1    ER Diagram
  The above diagram shows entities and their relationship for a virtual
assistant system. We have a user of a system who can have their keys and
values. It can be used to store any information about the user. Say, for
                                   16
key “name” value can be “Jim”. For some keys user might like to keep
secure. There he can enable lock and set a password (voice clip). Single
user can ask multiple questions. Each question will be given ID to get
recognized along with the query and its corresponding answer. User can
also be having n number of tasks. These should have their own unique
id and status i.e. their current state. A task should also have a priority
value and its category whether it is a parent task or child task of an older
task.
                                     17
5.2    Activity Diagram
  Initially, the system is in idle mode. As it receives any wake up cal it
begins execution. The received command is identified whether it is a ques-
tionnaire or a task to be performed. Specific action is taken accordingly.
After the Question is being answered or the task is being performed, the
system waits for another command. This loop continues unless it receives
quit command. At that moment, it goes back to sleep
                                    18
5.3       Class Diagram
  The class user has 2 attributes command that it sends in audio and the
response it receives which is also audio. It performs function to listen the
user command. Interpret it and then reply or sends back response accord-
ingly. Question class has the command in string form as it is interpreted
by interpret class. The task class also has interpreted command in string
format. It has various functions like reminder, note, mimic, research and
reader.
                                     19
5.4    Use Case Diagram
  In this project there is only one user. The user queries command to
the system. System then interprets it and fetches answer. The response
is sent back to the user
                                  20
5.5    Sequence Diagram
  The above sequence diagram shows how an answer asked by the user
is being fetched from internet. The audio query is interpreted and sent to
Web scraper. The web scraper searches and finds the answer. It is then
sent back to speaker, where it speaks the answer to user.
                                    21
Chapter 6
Result
6.1   User Interface
  Shows the interface for start or stop working of program.
                                  22
6.2   Output
  Shows the output when asking for play music...
  Shows the output when asking for open youtube...
                                 23
Chapter 7
BIBLIOGRAPHY
Websites referred
 • www.stackoverflow.com
 • www.pythonprogramming.net
 • www.codecademy.com
 • www.tutorialspoint.com
YouTube Channels referred
 • code with harry
 • mySirG
                               24