PROJECT REPORT ON EYE CONTROLLED MOUSE: MEDIA PIPE AND
NUMPY-BASED HCI SOLUTION
- SUBMITTED BY -
SAMBIT CHANDA
UNIVERSITY ROLL NO.: - 500221010056
SAPTARSHI PRAMANIK
UNIVERSITY ROLL NO.: - 500221010059
SHIBAM SANYAL
UNIVERSITY ROLL NO.: - 500221010062
SWARNABHA DUTTA
UNIVERSITY ROLL NO.: - 500221010074
SUVAYAN DAS
UNIVERSITY ROLL NO.: - 50022211003
SAYAN KUNDU
UNIVERSITY ROLL NO.: - 50022211035
- Under the guidance of -
Dr. (Prof.) Suparna Biswas
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
GURU NANAK INSTITUTE OF TECHNOLOGY
SODEPUR, PANIHATI
KOLKATA-700114
CERTIFICATE OF RECOMMENDATION
This is to certify that the project entitled ‘EYE CONTROLLED MOUSE: MEDIA PIPE
AND NUMPY-BASED HCI SOLUTION’ is a bona-fide work carried out by Sambit
Chanda, Saptarshi Pramanik, Shibam Sanyal, Sayan Kundu, Suvayan Das and
Swarnabha Dutta who are the students of B.Tech. (4nd year) in ECE at Guru Nanak
Institute of Technology during the academic year 2024-2025. This report on the
said project is submitted in partial fulfilment of the requirements for the award
of the degree of Bachelor of Technology (Electronics and Communication
Engineering). Under my supervision, this work has not been submitted
elsewhere for a degree or any other purpose.
-------------------------------------- -----------------------------------
Signature of Project Guide Signature of HOD
(Dr. Suparna Biswas) Department of ECE, GNIT
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to everyone who contributed to
the completion of this project.
Firstly, we would like to thank our group members:
Sayan Kundu, Shibam Sanyal, Sambit Chanda, Saptarshi Pramanik, Swarnabha
Dutta, and Suvayan Das.
Their strengths, collaborative spirit, and unwavering dedication were
instrumental in achieving this project's goals.
Secondly, we would like to thank Dr. Suparna Biswas Ma’am, our project
instructor, for her guidance and valuable feedback throughout the project.
Her support and encouragement helped us stay focused and motivated
throughout the research and writing process.
We are grateful for the support and guidance we received from everyone
mentioned above. This project would not have been possible without their
invaluable contributions.
Sincerely,
The members of the Group.
INDEX
SL Topic Page
No.
1. Abstract 1
2. Objective 2
3. Introduction 3
4. Working Principle 4
5. Implementation Details 5
6. Explanation Of Python Code 6-7
7. Flowchart 8-9
8. Output 10
9. Future Scope 11
10. Conclusion 12
ABSTRACT
This project presents an innovative approach to human-computer interaction
(HCI) by developing an eye-controlled mouse system. Leveraging the power of
Media Pipe and NumPy, the system accurately tracks eye movements and
translates them into cursor movements on a computer screen. By utilizing real-
time video processing techniques, the project aims to provide an accessible and
intuitive interface for individuals with limited mobility. The system's potential
applications extend to gaming, education, and assistive technologies, making it
a valuable tool for enhancing user experience.
OBJECTIVE
This project aims to develop and evaluate a proof-of-concept eye-controlled
mouse system using the Media Pipe and NumPy libraries. The primary objective
is to investigate the feasibility of translating real-time eye movements, captured
through a webcam, into accurate cursor movements on a computer screen. This
research seeks to explore the potential of this technology as an alternative input
method for individuals with limited motor abilities.
Key aspects of this objective:
- Feasibility: Determine if a functional eye-controlled mouse system can be
developed using the specified libraries.
- Accuracy: Evaluate the accuracy of eye gaze estimation and cursor
control.
- Usability: Assess the system's usability and user experience for potential
end-users.
- Technology Exploration: Investigate the potential and limitations of using
Media Pipe and NumPy for developing such a system.
This objective clearly states the primary goal of the project and outlines the key
areas of investigation.
INTRODUCTION
Traditional methods of interacting with computers, such as using a mouse and
keyboard, rely heavily on fine motor skills. For individuals with physical
disabilities, these traditional interfaces can present significant challenges.
Conditions like motor neuron diseases, spinal cord injuries, and cerebral palsy
can severely limit the ability to use conventional input devices.
This limitation creates a significant barrier to accessing technology and
participating fully in digital society. Individuals with disabilities may face
difficulties in accomplishing everyday tasks such as browsing the internet,
communicating, and accessing educational resources.
Eye-tracking technology offers a promising alternative by enabling users to
control computers solely through their gaze. By tracking the movement of the
eyes, the system can accurately determine the user's intended point of focus on
the screen. This information can then be translated into cursor movements,
allowing for precise control of the computer.
This project aims to explore the development of an eye-controlled mouse system
using two powerful libraries.
MediaPipe: A Google-developed framework that provides pre-trained machine
learning models for various computer vision tasks, including facial landmark
detection. This will be crucial for accurately locating and tracking the user's eyes
within the video feed.
NumPy: A fundamental library for numerical computing in Python. NumPy will
be utilized for efficient image processing, mathematical calculations, and
implementing the algorithms necessary for gaze estimation and cursor control.
WORKING PRINCIPLE
The eye-controlled mouse system operates on the following principles:
• Real-time Video Capture: A webcam captures video frames of the user's
face.
• Face and Eye Detection: Media Pipe's facial landmark detection model is
used to locate the face and eyes within each frame.
• Eye Gaze Estimation: The position of the pupil relative to the iris is
analyzed to determine the direction of gaze.
• Cursor Control: The estimated gaze direction is mapped to cursor
movements on the screen.
• Click Simulation: A specific eye gesture, such as a prolonged gaze or a
blink, can be used to simulate a mouse click result in respective cursor
movements.
The system calibrates the mapping between gaze direction and cursor
movement to ensure accurate control.
• Click Simulation: Blink - A single or multiple blinks can be programmed to
simulate a left-click or a right-click.
IMPLEMENTATION DETAILS
• Media Pipe: The Backbone of Real-Time Face Tracking. Media Pipe, a
versatile framework from Google, plays a pivotal role in our real-time face
tracking system. It offers a suite of pre-trained machine learning models
and pipelines, making it a powerful tool for various computer vision tasks.
• Face Mesh Model: Unraveling Facial Features. At the heart of our system
lies the Face Mesh model. This sophisticated model is designed to detect
and track a staggering 468 facial landmarks in real-time. These landmarks
represent key points on the face, such as the eyes, nose, mouth, and
eyebrows. By pinpointing these precise locations, we can accurately
analyze facial expressions and movements.
• Solutions API: A Streamlined Approach. Medi Pipe's Solutions API
simplifies the integration of complex computer vision models into our
application. This API provides pre-trained models and pipelines, allowing
us to focus on the core functionality of our system rather than the
intricate details of model training and optimization. By leveraging the
Solutions API, we can quickly and efficiently deploy our face tracking
system.
• NumPy: The Powerhouse of Numerical Computation. NumPy, a
fundamental library for scientific computing in Python, is instrumental in
processing and manipulating the vast amount of image data generated by
our system. Its efficient array operations enable us to:
• Image Processing: NumPy provides tools for reading, writing, and
manipulating images as multi-dimensional arrays. This allows us to
perform operations like resizing, cropping, and color channel
manipulation.
• Mathematical Calculations: NumPy's extensive mathematical functions
are crucial for various tasks, including:
• Geometric Transformations: Applying transformations like rotation,
scaling, and translation to facial landmarks for accurate alignment and
analysis.
• Coordinate Calculations: Computing distances, angles, and other
geometric properties between facial landmarks to extract meaningful
information about facial expressions and movements.
• Statistical Analysis: Analyzing the distribution of facial landmark positions
to identify patterns and trends.
EXPLANATION OF THE PYTHON CODE
1) Libraries:
• OpenCV (cv2): Provides functionalities for video capture and image
processing.
• Media Pipe (mp): Offers pre-trained machine learning models for
various tasks, including face mesh detection.
• Py-Auto GUI: Enables programmatic control of the mouse and
keyboard.
2) Functionality:
Initialization:
• The program starts by importing the required libraries.
• It then creates a Video Capture object (cam) to access the webcam feed.
• A Face Mesh object (face_mesh) is initialized from the Media Pipe library
to detect facial landmarks in real-time.
• Finally, it retrieves the screen resolution using pyautogui.size().
3) Video Processing Loop:
• The program enters a continuous loop that captures video frames from
the webcam.
• Each frame is flipped horizontally for a more natural view (cv2.flip).
• The frame is converted from BGR (OpenCV color format) to RGB format
(cv2.cvtColor) to suit Media Pipe's processing needs.
• The face_mesh.process method is called on the RGB frame to detect
faces and extract their landmarks (facial key points).
4) Eye Movement Detection:
• The program checks if any faces were detected in the frame
(landmark_points).
• If a face is found, it retrieves the specific landmarks corresponding to
the right eye (landmarks[474:478]).
• It iterates through these landmarks, drawing green circles on the frame
to visualize their positions.
• The program calculates the on-screen coordinates (screen_x, screen_y)
for the center of the right pupil based on the frame size and screen
resolution.
• It utilizes pyautogui.moveTo to move the mouse cursor to the
corresponding location on the screen.
5) Click Detection:
• The program extracts landmarks for the left eye corners
(landmarks[145], landmarks[159]).
• It draws blue circles on the frame to visualize these points.
• The program calculates the vertical distance between the upper and
lower left eye corners.
• If the distance is less than a specific threshold (indicating a blinking
motion), it triggers a mouse click using pyautogui.click.
6) Display and Exit:
• The program displays the processed video frame with visualized
landmarks on a window titled "Eye Controlled Mouse."
• It waits for a key press (any key) using cv2.waitKey.
• Pressing a key likely exits the program loop, terminating the application.
Overall, this program demonstrates the potential of computer vision for creating
an eye-controlled mouse interface. By tracking specific facial landmarks, it can
translate eye movements into mouse cursor control and potentially simulate
clicks.
FLOWCHART
The flowchart illustrates a system that utilizes a webcam to track eye
movements and use them to control cursor movement and left-click
actions on a computer screen.
i. Capturing Live Image
• The system starts by capturing a live image from the webcam. This
image serves as the input for the subsequent steps.
• Wait for 1 Second: A delay of 1 second is introduced. This might
be to allow for image processing and stability before proceeding
to the next step.
ii. Processing Landmarks of Left and Right Eye using MediaPipe: MediaPipe,
a computer vision framework, is employed to detect and process the
landmarks of the left and right eyes in the captured image. These
landmarks are crucial for determining eye movement.
iii. Calculating Eye Movements using X,Y Coordinates of the Frame: The X and
Y coordinates of the eye landmarks are used to calculate the direction and
magnitude of eye movement. This calculation likely involves comparing
the current landmark positions to previous frames to detect changes.
iv. Move the Cursor Accordingly / Perform a Left Click: Based on the
calculated eye movements, the system interprets the user's intent. If the
eye movement indicates a desired cursor movement, the cursor is moved
accordingly on the screen. If the eye movement signifies a left-click action,
the left mouse button is simulated.
v. Looping Mechanism: The flowchart appears to be a continuous loop. Once
the cursor movement or left-click action is executed, the system returns
to step 1, capturing a new image from the webcam and repeating the
process.
In essence, this system provides a way to control the computer cursor and
perform actions using eye movements instead of traditional input devices.
OUTPUT
Figure 1. Real Time Cursor Tracking with the help of Eye movement.
FUTURE SCOPES
Eye-controlled mouse technology is an exciting field with immense potential for
future advancements. Here are some key areas of exploration:
• Advanced Machine Learning: Utilizing cutting-edge machine learning
algorithms to refine eye gaze estimation, leading to more precise cursor
control.
• Real-time Calibration: Developing systems that can automatically adapt to
individual users' unique eye characteristics, ensuring optimal
performance.
• Voice Commands: Integrating voice recognition with eye tracking to
enable hands-free control of various functions.
• Head Gestures: Incorporating head movements as additional input *
Continuous Adaptation: Creating algorithms that can automatically adjust
to changes in lighting conditions, user fatigue, or other factors affecting
eye tracking accuracy.
• Personalized Profiles: Developing user profiles that store individual
calibration data, allowing for seamless and personalized experiences.
• Immersive Experiences: Leveraging eye tracking to enhance immersion in
virtual reality environments, enabling users to interact with virtual objects
using only their gaze.
• Intuitive Control: Developing natural and intuitive control schemes for
virtual reality applications, such as selecting objects, navigating menus,
and triggering actions.
CONCLUSION
The development of an eye-controlled mouse using MediaPipe and NumPy
marks a significant advancement in creating more inclusive and accessible
computing experiences. By accurately tracking eye movements and translating
them into precise cursor control, this technology empowers individuals with
limited mobility to interact with computers independently. This technology is
particularly beneficial for individuals with disabilities such as ALS, cerebral palsy,
or spinal cord injuries, who may have difficulty using traditional input devices.
The use of MediaPipe and NumPy in this project highlights the power of open-
source tools and libraries in driving innovation. MediaPipe, a cross-platform
framework for building multimodal applied machine learning pipelines, provides
robust tools for real-time face mesh detection and tracking. NumPy, a powerful
library for numerical computations, enables efficient processing of the captured
images and video frames.
As the field of computer vision continues to advance, we can expect further
improvements in eye-tracking technology. This could include increased accuracy,
faster processing speeds, and the ability to track eye movements in more
challenging environments. These advancements will lead to even more
innovative and user-friendly interfaces that cater to a wider range of users,
including those with disabilities.
In conclusion, the development of an eye-controlled mouse using MediaPipe
and NumPy represents a significant step towards creating more inclusive and
accessible computing experiences. This technology has the potential to
revolutionize the way individuals with limited mobility interact with computers,
empowering them to lead more independent and fulfilling lives.