Pramoul
Pramoul
on
“VERB - O - CONTROL”
Submitted in partial fulfillment of the requirement for the award of the degree
of
JADALA PRAMOUL
UID: 111723039144
Dr. P. Dayaker
HoD, Assistant Professor
DEPARTMENT OF MCA
LOYOLA ACADEMY DEGREE & PG COLLEGE
Department
of
Master of Computer
Applications
CERTIFICATE
A Acknowledgement i
B Declaration ii
C Abstract iii
E List of tables vi
Chapter - 2 5-9
SYSTEM ANALYSIS
2.1 Hardware And Software Requirements 5-6
2.3 Scope 7
3.3 Frameworks 11 - 12
4.2 Architecture 13 - 14
Chapter - 6 IMPLEMENTATION 24 - 27
Chapter - 7 TESTING 28 - 35
7.4 Maintenance 32
9.1 Conclusion 38
Chapter - 10 BIBLIOGRAPHY 40
10.1 Websites 40
10.2 References 40
LIST OF FIGURES
(TEST CASES)
7.6 33
UPLOADING PDFS
7.6.1 33
PROCESSING PDFS
7.6.2 33
FILES PROCESSED AND WAITING FOR
7.6.3 34
QUESTIONS
d
Verb-o-Control DEPARTMENT OF MCA
LIST OF OUTPUTS
LOYOLA ACADEMY
i
d
Verb-o-Control DEPARTMENT OF MCA
ACKNOWLEDGEMENT
This acknowledgment transcends the reality of formality, and I would like to express deep
gratitude and respect to all those people behind this project who guided, inspired, and helped
me complete it.
I express my profound gratitude to Rev. Fr. Dr N.B Babu SJ, the Principal of Loyola Academy
Degree and PG College, and Dr. P. Dayaker, the Head of the MCA department, for giving me
this opportunity to pursue this Major-project, which I was passionate about, and helping me
add a feather to my hat of educational assets.
I extend my special thanks to my internal guide Dr. P. Dayaker for the time and efforts that
he provided throughout the year. His guidance, advice, and suggestions were extremely
helpful to me during the completion of the major project. In this aspect, I am eternally grateful
to you.
I acknowledge that this Major-project was completed entirely by me and not by someone else.
LOYOLA ACADEMY
ii
Verb – O – Control DEPARTMENT OF MCA
DECLARATION
I, Jadala Pramoul, a student of NMCA, hereby declare that the Major-Project titled “VERB
- O - CONTROL” which is submitted by me to Dr. P. Dayaker, Loyola Academy Degree
and PG College, Secunderabad, Alwal, in partial fulfillment of requirement for the award of
the degree of computer science, has not been previously formed the basis for the award of
any degree, diploma or other similar title or recognition. The Author attests that permission
has been obtained for the use of any copyrighted material appearing in the Dissertation, or
Major-Project report other than brief excerpts requiring only proper acknowledgment in
scholarly writing, and all such use is acknowledged.
LOYOLA ACADEMY
iii
Verb – O – Control DEPARTMENT OF MCA
ABSTRACT
A standout feature of this system is its slide content search. This enables users to speak a
keyword or phrase, and the system intelligently identifies and jumps to the most relevant slide
based on slide content. This makes it particularly useful in situations where quick access to
specific content is needed, enhancing both the efficiency and interactivity of the presentation
experience.
The system also supports future-ready integrations like multi-language command recognition and
gesture-based controls, making it accessible to a wider range of users, including those with
mobility challenges. It is built using Python, integrating libraries such as speech_recognition,
pyautogui, and python-pptx, and optionally leverages AI frameworks for content understanding
and speech processing.
By offering a voice-first interface for presentation control, the system enhances the delivery of
lectures, meetings, and seminars. It not only improves the flow and professionalism of
presentations but also brings greater accessibility and engagement to digital communication.
“Voice-Controlled Presentation & Slide Navigator” represents a smart leap toward hands-free
interactions in modern presentation environments.
LOYOLA ACADEMY iv
Verb – O – Control DEPARTMENT OF MCA
1. INTRODUCTION
This chapter provides an overview of the system’s purpose, aim, objectives, background, scope, and
module-wise breakdown. The project focuses on enhancing the presentation experience using
The purpose of this project is to modernize and simplify presentation control using voice commands.
Presenters often struggle with seamless transitions during presentations due to the constant need to
use a keyboard, mouse, or remote. This system aims to eliminate that friction by allowing full slide
The primary aim of the project is to develop a reliable, real-time voice-based navigation system
for PowerPoint or Google Slides, enabling users to move between slides, jump to specific slides, or
even search based on slide content using spoken commands. The solution is especially beneficial
for differently-abled users and presenters seeking hands-free operation for a smooth delivery.
2. Improved Accessibility: Build an inclusive solution that aids individuals with physical
3. Real-Time Slide Matching: Implement AI-based keyword detection to locate and jump
to specific slides.
LOYOLA ACADEMY v
Verb – O – Control DEPARTMENT OF MCA
In today’s digital classrooms, corporate meetings, and conferences, presentations are central to
Inspired by advancements in speech recognition and AI, this project was initiated to provide an
intuitive, voice-first interface for navigating presentation slides. The idea is rooted in making
presentations more fluid, accessible, and engaging. As speech-based assistants like Alexa and Siri
gain traction, it’s logical to apply similar technology to the domain of public speaking and education.
The system also caters to those who may not be able to physically interact with traditional input
devices, making it a more inclusive tool. With voice commands like “next slide,” “go to slide five,”
or “search revenue,” the presenter can maintain eye contact with the audience while maintaining full
control over the presentation content. This project aims to redefine the way we interact with digital
The scope of this project is focused on the development and implementation of a Python-based
technologies including speech recognition, keyword search, AI-based slide matching, and GUI
automation.
Speech Command Capture: Use microphones and APIs to recognize voice input in real-
time.
Slide Text Extraction: Extract text content from PPTX slides using Python libraries for
matching.
LOYOLA ACADEMY vi
Verb – O – Control DEPARTMENT OF MCA
Command Interpretation & Mapping: Parse recognized speech and match to relevant
actions or content.
Slide Automation: Use automation tools to control presentations based on user commands.
Accessibility Support: Provide features that enhance usability for differently-abled users.
based annotations.
The Voice-Controlled Presentation & Slide Navigator system is divided into multiple functional
accuracy.
Technology: Python-based logic that maps speech to functions like slide jump, start, stop,
etc.
Responsibility: Matches spoken keywords with slide content to jump directly to the most
relevant slide.
similarity.
slide.
Technology: Uses pyautogui to simulate key presses and control the presentation.
differently-abled users.
Technology: Includes support for language translation APIs and adjustable sensitivity
levels.
2. SYSTEM ANALYSIS
This chapter provides a comprehensive understanding of the system’s design process, including
hardware and software requirements, a Software Requirement Specification (SRS), and
comparisons between the existing and proposed systems. It outlines both functional and non-
functional requirements, supporting the full development of the project.
Storage: 40 GB or more
Software Requirement Specification (SRS) acts as a communication bridge between clients and
developers. It includes understanding the problem, identifying goals, and preparing structured
requirements before actual development begins.
Requirement Analysis:
LOYOLA ACADEMY ix
Verb – O – Control DEPARTMENT OF MCA
Identify use cases like jump-to-slide, keyword search, and start/stop control
Requirement Specification:
Functional Requirements
Feedback Module:
Provides verbal confirmation using TTS (text-to-speech)
LOYOLA ACADEMY x
Verb – O – Control DEPARTMENT OF MCA
Non-Functional Requirements
Performance:
Must recognize commands in real-time and execute within 1–2 seconds
Reliability:
Should work with consistent accuracy under varying voice tones
Scalability:
Should be extendable to control Google Slides or PDF presentations
Security:
Should handle only user’s voice input without unauthorized triggers
Usability:
Clean and intuitive to operate during live presentations
Maintainability:
Should be modular and well-commented for future updates
2.2.3 SCOPE
This document serves as the single source of truth for system requirements. It ensures the design
remains aligned with user expectations and that all functional and non-functional needs are well-
documented. All future changes will follow a defined approval process.
Mouse clicks
LOYOLA ACADEMY xi
Verb – O – Control DEPARTMENT OF MCA
The proposed voice-controlled presentation navigator addresses the limitations of existing systems
by offering:
Hands-free navigation
Voice-Driven Precision
3. TECHNOLOGIES USED
Python is chosen for its ease of use and availability of vast libraries such as speech_recognition,
pyttsx3, pyautogui, and more. It supports rapid prototyping and integration with external APIs.
Allows dynamic control like “search machine learning” or “go to intro slide”
Python provides seamless integration with speech recognition libraries, GUI automation tools, and
PowerPoint handling frameworks, making it the ideal language for this project. Its flexibility allows
for robust voice command interpretation and precise slide navigation, aligning perfectly with the
3.3 Frameworks
SpeechRecognition:
Overview: Python library that enables speech-to-text functionality using APIs such as
Use in Project: Converts spoken commands like “next”, “slide four”, or “search
pyautogui:
Overview: Cross-platform GUI automation tool that simulates mouse and keyboard events.
Use in Project: Used to control PowerPoint slides (next, previous, specific slide jumps) via
keyboard shortcuts.
python-pptx:
Use in Project: Parses slide content to enable keyword-based search and slide indexing.
Advantages: Access to slide titles, text, and layout for accurate mapping and navigation.
SpeechRecognition:
Usage: Converts user voice into commands to control the PowerPoint presentation.
python-pptx:
Usage: Extracts text from slides to allow keyword-based navigation (e.g., “search
summary”).
pyautogui:
Usage: Simulates slide transitions (next, previous, jump to slide) using voice-triggered key
events.
pyttsx3:
Usage: Provides spoken feedback to the user, such as “Jumping to slide five.”
Usage: May assist in managing extracted slide data or maintaining logs of user actions.
System design is transition from a user oriented document to programmers or data base
personnel. The design is a solution, how to approach to the creation of a new system. This is
composed of several steps. It provides the understanding and procedural details necessary for
implementing the system recommended in the feasibility study. Designing goes through logical
and physical stages of development, logical design reviews the present physical system, prepare
input and output specification, details of implementation plan and prepare a logical design
walkthrough.
4.2 ARCHITECTURE:
A SMS user for who the application looks like an user interface actually consists of a
database called as SQLite that comes along with Android SDK and need no other installation.
This is the database that is used to store and retrieve information. This is an application that is
developed in java and hence all its features apply here as well such as platform independence,
data hiding.
LOYOLA ACADEMY 18
Verb – O – Control DEPARTMENT OF MCA
Things.
Relationships.
Diagrams.
Structural things.
Behavioral things.
Grouping things.
Annotational things.
LOYOLA ACADEMY 19
Verb – O – Control DEPARTMENT OF MCA
1. Structural things are the nouns of UML models. The structural things used in the
project design are:
First, a class is a description of a set of objects that share the same attributes,
operations, relationships and semantics.
Window
origin
size
open()
close()
move()
display()
Fig: Classes
Second, a use case is a description of set of sequence of actions that a system
performs that yields an observable result of value to particular actor.
Fig: Nodes
2. Behavioral things are the dynamic parts of UML models. The behavioral thing used
is:
Interaction: An interaction is a behavior that comprises a set of messages
exchanged among a set of objects within a particular context to accomplish a
LOYOLA ACADEMY 20
Verb – O – Control DEPARTMENT OF MCA
Fig: Messages
Fig: Dependencies
Fig: Association
A generalization is a specialization/ generalization relationship in which objects of
thespecialized element (the child) are substitutable for objects of the generalized
element(the parent).
Fig: Generalization
A realization is a semantic relationship between classifiers, where in one classifier
specifies a contract that another classifier guarantees to carry out.
Fig: Realization
LOYOLA ACADEMY 21
Verb – O – Control DEPARTMENT OF MCA
LOYOLA ACADEMY 22
Verb – O – Control DEPARTMENT OF MCA
A use case diagram is a graph of actors set of use cases enclosed by a system boundary,
communication associations between the actors and users and generalization among use cases.
The use case model defines the outside (actors) and inside (use case) of the system’s behavior.
Sequence diagram are used to represent the flow of messages, events and actions
between the objects or components of a system. Time is represented in the vertical direction
showing the sequence of interactions of the header elements, which are displayed horizontally
at the top of the diagram.
LOYOLA ACADEMY 23
Verb – O – Control DEPARTMENT OF MCA
LOYOLA ACADEMY 24
Verb – O – Control DEPARTMENT OF MCA
LOYOLA ACADEMY 25
Verb – O – Control DEPARTMENT OF MCA
Data flow diagrams are used to visualize the topology of the physical components of a system
where the software components are deployed. So deployment diagrams are used to describe
the static deployment view of a system. Deployment diagrams consist of nodes and
theirrelationships.
LOYOLA ACADEMY 26
Verb – O – Control DEPARTMENT OF MCA
5. INPUT/OUTPUT DESIGN
Input design for the accident detection system involves creating interfaces through
which users interact with the system. Users can input data in the form of captured images of
accident scenes and additional information such as location and time. Validation techniques
ensure that input data meets predefined criteria, enhancing accuracy and reliability. The system
provides various input interfaces, including a web-based interface and a mobile application,
allowing users to conveniently upload images and input details directly.
LOYOLA ACADEMY 27
Verb – O – Control DEPARTMENT OF MCA
6. IMPLEMENTATION
import speech_recognition as sr
import pyautogui
import time
import os
import re
prs = Presentation(ppt_path)
slide_texts = []
text = ""
if hasattr(shape, "text"):
slide_texts.append((idx + 1, text.strip()))
LOYOLA ACADEMY 28
Verb – O – Control DEPARTMENT OF MCA
word_to_num = {
"one": 1, "first": 1,
"two": 2, "second": 2,
"three": 3, "third": 3,
"four": 4, "fourth": 4,
"five": 5, "fifth": 5,
"six": 6, "sixth": 6,
"seven": 7, "seventh": 7,
"eight": 8, "eighth": 8,
"nine": 9, "ninth": 9,
def listen_command():
recognizer = sr.Recognizer()
recognizer.adjust_for_ambient_noise(source, duration=0.8)
audio = recognizer.listen(source)
try:
return recognizer.recognize_google(audio).lower()
except:
LOYOLA ACADEMY 29
Verb – O – Control DEPARTMENT OF MCA
# Navigation Functions
def go_to_slide(slide_num):
pyautogui.typewrite(str(slide_num))
pyautogui.press("enter")
def search_slide(keyword):
if keyword in content:
go_to_slide(num)
pyautogui.hotkey('ctrl', 'f')
time.sleep(0.3)
pyautogui.typewrite(keyword)
return
def handle_command(command):
pyautogui.press("right")
pyautogui.press("left")
pyautogui.press("f5")
LOYOLA ACADEMY 30
Verb – O – Control DEPARTMENT OF MCA
pyautogui.press("esc")
go_to_slide(len(slide_texts))
go_to_slide(1)
if keyword:
search_slide(keyword)
else:
if match:
go_to_slide(int(match.group(1)))
return
if word in command:
go_to_slide(num)
return
if numbers:
go_to_slide(int(numbers[0]))
os.startfile(ppt_path)
pyautogui.press("f5")
while True:
cmd = listen_command()
if cmd:
handle_command(cmd)
LOYOLA ACADEMY 32
Verb – O – Control DEPARTMENT OF MCA
7.TESTING
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and code generation.
• To ensure that during operation the system will perform as per specification.
• To make sure that system meets the user requirements during operation
• To make sure that during the operation, incorrect input, processing and output will be
detected
• To see that when correct inputs are fed to the system the outputs are correct
• To verify that the controls incorporated in the same system as intended
• Testing is a process of executing a program with the intent of finding an error
• A good test case is one that has a high probability of finding an as yet undiscovered
error
The software developed has been tested successfully using the following testing
strategies and any errors that are encountered are corrected and again the part of the
program or the procedure or function is put to testing until all the errors are removed.
A successful test is one that uncovers an as yet undiscovered error.
Note that the result of the system testing will prove that the system is working
correctly. It will give confidence to system designer, users of the system, prevent
frustration during implementation process etc.
LOYOLA ACADEMY 33
Verb – O – Control DEPARTMENT OF MCA
Output testing.
Validation testing.
System testing.
1) White Box Testing:
White box testing is a testing case design method that uses the control structure of the
procedure design to derive test cases. All independents path in a module are exercised at least
once, all logical decisions are exercised at once, execute all loops at boundaries and within their
operational bounds exercise internal data structure to ensure their validity. Here the customer
is given three chances to enter a valid choice out of the given menu. After which the control
exits the current menu.
Black Box Testing attempts to find errors in following areas or categories, incorrect or
missing functions, interface error, errors in data structures, performance error and initialization
and termination error. Here all the input data must match the data type to become a valid entry.
3) Unit Testing:
Unit testing focuses verification effort on the smallest unit of Software design that is
the module. Unit testing exercises specific paths in a module’s control structure to ensure
complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit Testing.
4) Integration Testing:
Integration testing addresses the issues associated with the dual problems of verification
and program construction. After the software has been integrated a set of high order tests are
conducted. The main objective in this testing process is to take unit tested modules and builds
a program structure that has been dictated by design.
LOYOLA ACADEMY 34
Verb – O – Control DEPARTMENT OF MCA
This method begins the construction and testing with the modules at the lowest level in
the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for stubs is
eliminated.
User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever required. The
system developed provides a friendly user interface that can easily be understood even by a
person who is new to the system.
6) Output Testing:
After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the
specified format. Asking the users about the format required by them tests the outputs generated
or displayed by the system under consideration. Hence the output format is considered in 2
ways – one is on screen and another in printed format.
7) Validation Testing:
Text Field:
LOYOLA ACADEMY 35
Verb – O – Control DEPARTMENT OF MCA
The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry
always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has to
perform.
Taking various kinds of test data does the above testing. Preparation of test data plays
a vital role in the system testing. After preparing the test data the system under study is tested
using that test data. While testing the system by using test data errors are again uncovered and
corrected by using above testing steps and corrections are also noted for future use.
Live test data are those that are actually extracted from organization files. After a system
is partially constructed, programmers or analysts often ask users to key in a set of data from
their normal activities. Then, the systems person uses this data as a way to partially test the
system. In other instances, programmers or analysts extract a set of live data from the files and
have them entered themselves.
Artificial test data are created solely for test purposes, since they can be generated to
test all combinations of formats and values. In other words, the artificial data, which can
quickly be prepared by a data generating utility program in the information systems department,
make possible the testing of all login and control paths through the program.
LOYOLA ACADEMY 36
Verb – O – Control DEPARTMENT OF MCA
The most effective test programs use artificial test data generated by persons other than
those who wrote the programs. Often, an independent team of testers formulates a testing plan,
using the systems specifications.
Whenever a new system is developed, user training is required to educate them about
the working of the system so that it can be put to efficient use by those for whom the system
has been primarily designed. For this purpose the normal working of the project was
demonstrated to the prospective users. Its working is easily understandable and since the
expected users are people who have good knowledge of computers, the use of this system is
very easy.
6.4 MAINTAINENCE:
This covers a wide range of activities including correcting code and design errors. To
reduce the need for maintenance in the long run, we have more accurately defined the user’s
requirements during the process of system development. Depending on the requirements, this
system has been developed to satisfy the needs to the largest possible extent. With development
in technology, it may be possible to add many more features based on the requirements in
future. The coding and designing is simple and easy to understand which will make
maintenance easier.
A strategy for system testing integrates system test cases and design techniques into a
well planned series of steps that results in the successful construction of software. The testing
strategy must co-operate test planning, test case design, test execution, and the resultant data
collection and evaluation .A strategy for software testing must accommodate low-level tests
that are necessary to verify that a small source code segment has been correctly implemented
as well as high level tests that validate major system functions against user requirements.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding.
LOYOLA ACADEMY 37
Verb – O – Control DEPARTMENT OF MCA
LOYOLA ACADEMY 38
Verb – O – Control DEPARTMENT OF MCA
LOYOLA ACADEMY 39
Verb – O – Control DEPARTMENT OF MCA
7. OUTPUT SCREENS
7.1 Output
LOYOLA ACADEMY 40
Verb – O – Control DEPARTMENT OF MCA
8.1 CONCLUSION:
In conclusion, the development and implementation of the voice-controlled PPT and PDF
navigator represent a significant advancement in harnessing speech recognition and automation
technologies for seamless presentation control. By integrating intelligent voice command
processing and real-time content mapping, the system delivers accurate and efficient navigation
through slides and documents. The robust command interpretation and output feedback ensure
smooth interaction between the user and the interface, enhancing accessibility and convenience.
With support for natural language, flexible phrasing, and keyword-based search, the system
empowers users to control presentations effortlessly, even in noisy environments, improving
productivity and user experience.
The future scope for the voice-controlled PPT and PDF navigator presents numerous opportunities
for enhancement and innovation. Firstly, continuous improvements can be directed toward
refining voice recognition accuracy, especially in diverse acoustic environments and multilingual
settings. Additionally, integrating advanced natural language processing (NLP) models could
enable deeper understanding of user intent, allowing for more conversational and context-aware
interactions. The system can also evolve to support real-time collaboration, remote presentation
control, and integration with popular platforms like Google Slides or Zoom. Expanding support to
other document formats and incorporating AI-based content summarization and smart
recommendations can further enrich user experience. Collaborative development across
educational, corporate, and accessibility sectors can help in scaling this solution to broader use
cases and audiences.
LOYOLA ACADEMY 41
Verb – O – Control DEPARTMENT OF MCA
9. BIBLIOGRAPHY
9.1 WEBSITES:
https://www.geeksforgeeks.org/voice-recognition-using-google-speech-api-in-python/
[8]. www.python.org
[9]. https://realpython.com/working-with-pdf-files-in-python/
[10]. https://github.com/asweigart/pyautogui
https://docs.python.org/
https://github.com/Uberi/speech_recognition
https://docs.python.org/3/library/threading.html
[17]. Platform Independent GUI Automation – PyAutoGUI Guide. (2022). [Online]. Available:
https://automatetheboringstuff.com/2e/chapter20/
LOYOLA ACADEMY 42
Verb – O – Control DEPARTMENT OF MCA
9.2 REFERENCES:
[1] Uberi. (2015). SpeechRecognition: Speech recognition module for Python. GitHub Repository.
[2] Sweigart, A. (2015). Automate the Boring Stuff with Python: Practical Programming for Total
[3] Goldbaum, J. (2020). Real-Time Voice Command Recognition Using Python. Journal of
[4] Rao, P., & Kulkarni, S. (2021). Voice-assisted automation in smart environments using NLP.
[5] Wei, J., & Sun, Z. (2022). Smart document navigation through speech interaction: Design and
[6] Ramesh, T., & Nair, R. (2022). A study on integrating voice-based systems with PDF and
12–18.
LOYOLA ACADEMY 43