0% found this document useful (0 votes)
5 views19 pages

Final Graduation Project Book

The document presents VoiceBridge, a mobile application aimed at enhancing communication accessibility on WhatsApp through speech-to-text and text-to-speech functionalities using large language models (LLMs). It details the project's objectives, development process, and the integration of LLMs to address accessibility challenges in text-based communication. The project also emphasizes the importance of user-friendly design and system performance evaluation to promote inclusive communication.

Uploaded by

Zola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views19 pages

Final Graduation Project Book

The document presents VoiceBridge, a mobile application aimed at enhancing communication accessibility on WhatsApp through speech-to-text and text-to-speech functionalities using large language models (LLMs). It details the project's objectives, development process, and the integration of LLMs to address accessibility challenges in text-based communication. The project also emphasizes the importance of user-friendly design and system performance evaluation to promote inclusive communication.

Uploaded by

Zola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

\\\

Electronics and Communications Department

VoiceBridge: Bridging Communication Through Speech-


to-Text and Text-to-Speech Using Large Language
Models
Graduation project is submitted to Electronics and Communication
Department in Partial fulfillment of the requirements for the degree of Bachelor of Electronics
and Communications Engineering

BY
Mahmoud Ahmed Elghazouly
Ahmed Mohamed Shaker Zain Eldeen
Moustafa Basheer Mohamed
Mohamed Ehab
Magdy Yasser

SUPERVISED BY Dr. Amany

2025
List of Content………………………….Page No.

Chapter 1: Introduction................................................................................................. 1

1.1 Overview...................................................................................................................... 1

1.1.1 Background and Motivation........................................................................................................... 1

1.1.2 Introducing Voicebridge................................................................................................................. 2

1.1.3 Core Functionality.......................................................................................................................... 2

1.2 Problem Statement....................................................................................................... 3

1.2.1 Accessibility Challenges in Text-Based Communication…………………………….…………. 3

1.2.2 Limitations of Existing Solutions................................................................................................... 4

1.2.3 The Need for an Intelligent Voice-Based Solution......................................................................... 5

1.3 Objectives..................................................................................................................... 6

1.3.1 Development of a Voicebridge Mobile Application...................................................................... 6

1.3.2 Development of LLMs for Speech-to-Text and Text-to-Speech …………….………………….. 6

1.3.3 Fine-tuning of LLMs for Optimal Performance ............................................................................ 7

1.3.4 Real-time or Near Real-time Conversion....................................................................................... 7

1.3.5 User-Friendly Interface................................................................................................................... 7

1.3.6 Evaluation of System Performance......................................................................................... 8


1.4 Importance................................................................................................................... 9

1.4.1 Enhancing Accessibility................................................................................................................. 9

1.4.2 Promoting Inclusive Communication............................................................................................. 9

1.4.3 Exploring the Potential of LLMs in Communication Aids ……………………………………. 10

1.4.4 Potential for Future Development................................................................................................ 10

Chapter 2: Background................................................................................................ 11

2.1 WhatsApp and Accessibility...................................................................................... 11

2.1.1 Current Accessibility Features in WhatsApp............................................................................... 11

2.1.2 Limitations of Current Features.................................................................................................... 12

2.2 Large Language Models (LLMs)............................................................................... 13

2.2.1 Overview of LLMs....................................................................................................................... 13

2.2.2 LLMs for Speech-to-Text Conversion.......................................................................................... 14

2.2.3 LLMs for Text-to-Speech Conversion.......................................................................................... 15

2.3 NotificationListenerService (Android)...................................................................... 16

2.3.1 Functionality and Usage............................................................................................................... 16


2.3.2 Permissions and Privacy Considerations...................................................................................... 17

2.4 Server Deployment.................................................................................................... 18

2.4.1 Cloud-Based vs. Local Server...................................................................................................... 18

2.4.2 API Design for LLM Integration.................................................................................................. 19

Chapter 3: Mobile Application Development............................................................ 20

3.1 Application Requirements and Specifications........................................................... 20

3.1.1 Functional Requirements.............................................................................................................. 20

3.1.2 Non-Functional Requirements (Performance, Usability, etc.)…………………………………. 21

3.2 Application Architecture and Design......................................................................... 22

3.2.1 Application Flow and Navigation................................................................................................. 22

3.2.2 User Interface (UI) Design........................................................................................................... 23

3.2.3 Data Management......................................................................................................................... 24

3.3 Implementation Details.............................................................................................. 25

3.3.1 Development Environment and Tools (e.g., Android Studio, Kotlin)………………….………. 25

3.3.2 Notification Listener Implementation........................................................................................... 26

3.3.3 Integration with LLM API............................................................................................................ 27


3.3.4 Speech Input/Output Handling.................................................................................... 28

3.4 Testing and Debugging.............................................................................................. 29

3.4.1 Unit Testing of Application Components..................................................................................... 29

3.4.2 Integration Testing with LLM Server........................................................................................... 30

3.4.3 User Interface Testing................................................................................................................... 31

Chapter 4: Machine Learning Models........................................................................ 32

4.1 Model Selection and Architecture.............................................................................. 32

4.1.1 Speech-to-Text Model Details (e.g., model type, layers)……………………………….……… 32

4.1.2 Text-to-Speech Model Details (e.g., model type, vocoder)…………………………………….. 33

4.1.3 Rationale for Model Choices........................................................................................................ 34

4.2 Model Development from Scratch............................................................................. 35

4.2.1 Data Collection and Preparation................................................................................................... 35

4.2.2 Model Training Process................................................................................................................ 36

4.2.3 Training Environment and Resources.......................................................................................... 37

4.3 Model Evaluation and Performance Metrics.............................................................. 38

4.3.1 Speech-to-Text Accuracy Metrics (e.g., WER)............................................................................ 38


4.3.2 Text-to-Speech Quality Metrics (e.g., MOS)............................................................................... 39

4.3.3 Model Inference Time and Resource Usage................................................................................. 40

Chapter 5: Fine-Tuning of Large Language Models................................................. 41

5.1 Rationale for Fine-Tuning.......................................................................................... 41

5.1.1 Adapting Models for WhatsApp Communication Style………………………………………... 41

5.1.2 Improving Accuracy and Contextual Understanding................................................................... 42

5.2 Fine-Tuning Data Preparation.................................................................................... 43

5.2.1 Collection of WhatsApp-Specific Data........................................................................................ 43

5.2.2 Data Cleaning and Formatting...................................................................................................... 44

5.2.3 Data Augmentation Techniques (if used)..................................................................................... 45

5.3 Fine-Tuning Methodology and Parameters................................................................ 46

5.3.1 Fine-Tuning Approach (e.g., transfer learning)............................................................................ 46

5.3.2 Hyperparameter Selection and Tuning......................................................................................... 47

5.3.3 Evaluation Metrics During Fine-Tuning...................................................................................... 48

5.4 Results and Analysis of Fine-Tuning......................................................................... 49

5.4.1 Comparison of Performance Before and After Fine-Tuning………………….………………... 49


5.4.2 Impact of Fine-Tuning on Specific Metrics (Accuracy, Fluency)……………………….……... 50

5.4.3 Qualitative Analysis of Fine-Tuned Model Output...................................................................... 51

Chapter 6: Security Considerations............................................................................ 52

6.1 Privacy of WhatsApp Data........................................................................................ 52

6.1.1 Handling of Notification Data...................................................................................................... 52

6.1.2 Anonymization and Pseudonymization Techniques (if applicable)………………….………… 53

6.1.3 Compliance with Data Protection Regulations............................................................................. 54

6.2 Security of Communication Channels....................................................................... 55

6.2.1 Secure API Communication (e.g., HTTPS).................................................................................. 55

6.2.2 Encryption of Data in Transit....................................................................................................... 56

6.3 Server Security........................................................................................................... 57

6.3.1 Access Control and Authentication.............................................................................................. 57

6.3.2 Protection Against Unauthorized Access..................................................................................... 58

6.3.3 Regular Security Audits and Updates........................................................................................... 59

6.4 Application Security.................................................................................................. 60

6.4.1 Protection Against Reverse Engineering...................................................................................... 60


6.4.2 Secure Data Storage on the Device.............................................................................................. 61

6.4.3 Input Validation and Sanitization................................................................................................. 62

Chapter 7: System Integration and Testing............................................................... 63

6.1 Integration of Application and LLM Server.............................................................. 63

6.1.1 API Communication Testing........................................................................................................ 63

6.1.2 Data Flow Verification................................................................................................................. 64

6.2 End-to-End System Testing....................................................................................... 65

6.2.1 Test Cases for Different Scenarios (e.g., varying message lengths)…………………………… 65

6.2.2 Performance Testing (Latency, Throughput)............................................................................... 66

6.2.3 Usability Testing with Target Users............................................................................................. 67

6.3 Results and Analysis.................................................................................................. 68

6.3.1 Overall System Performance Evaluation...................................................................................... 68

6.3.2 Identification of Bottlenecks or Issues......................................................................................... 69

6.3.3 Recommendations for Further Improvement................................................................................ 70

Chapter 8: Conclusion and Future Work................................................................... 71

7.1 Conclusion........................................................................................................................................... 71
7.2 Future Work......................................................................................................................................... 72

7.2.1 Enhanced Language Support........................................................................................................ 73

7.2.2 Integration with Other Messaging Platforms................................................................................ 74

7.2.3 Personalized User Profiles............................................................................................................ 75

7.2.4 Offline Functionality.................................................................................................................... 76

References............................................................................................................................ 77

Appendices........................................................................................................................... 78
List of Figures
Figure 1.1: High-Level Overview of the Voicebridge System................................................... 3

Figure 1.2: Use Case Diagram for Voicebridge Application.................................................... 5

Figure 2.1: WhatsApp Current Accessibility Features (Screenshot Example)..................... 12

Figure 2.2: Limitations of WhatsApp Accessibility - Text Size Issue (Screenshot)............. 13

Figure 2.3: LLM Architecture for Speech-to-Text Conversion............................................. 14

Figure 2.4: LLM Architecture for Text-to-Speech Conversion............................................. 16

Figure 2.5: NotificationListenerService Workflow in Android............................................. 17

Figure 3.1: Voicebridge Application Workflow Diagram....................................................... 21

Figure 3.2: Sequence Diagram of WhatsApp Notification Capture and Processing............ 22

Figure 3.3: User Interface Mockup - Main Conversation Screen (Text View)..................... 24

Figure 3.4: User Interface Mockup - Main Conversation Screen (Voice Input).................. 25

Figure 3.5: Class Diagram for Voicebridge Application........................................................ 26

Figure 3.6: Data Flow Diagram of the Voicebridge System.................................................... 27

Figure 3.7: Entity-Relationship Diagram for Data Management......................................... 28

Figure 3.8: Unit Test Example - Speech-to-Text Module....................................................... 30

Figure 4.1: Architecture of the Speech-to-Text Model (Detailed).......................................... 33

Figure 4.2: Architecture of the Text-to-Speech Model (Detailed).......................................... 34

Figure 4.3: Data Collection Pipeline for LLM Training......................................................... 36


Figure 4.4: Example of Spectrogram Visualization from Speech Data................................ 37

Figure 4.5: Graph of Word Error Rate (WER) During Training......................................... 39

Figure 4.6: Graph of MOS Score for Text-to-Speech Output............................................... 40

Figure 5.1: Fine-Tuning Process Overview............................................................................... 42

Figure 5.2: Example of WhatsApp-Specific Data for Fine-Tuning........................................ 44

Figure 5.3: Fine-Tuning Data Augmentation Techniques (Diagram)................................... 45

Figure 5.4: Graph of Learning Rate During Fine-Tuning...................................................... 47

Figure 5.5: Comparison of WER Before and After Fine-Tuning........................................... 50

Figure 5.6: Qualitative Comparison of Speech Output (Spectrograms)............................... 51

Figure 6.1: Secure API Communication Flow (HTTPS)......................................................... 56

Figure 6.2: System Architecture Diagram with Security Layers.......................................... 57

Figure 6.3: End-to-End System Testing Setup......................................................................... 66

Figure 6.4: Latency Measurement Results for Voice Conversion......................................... 67

Figure 7.1: Enhanced Language Support Options (UI Mockup).......................................... 72

Figure 7.2: Integration with Multiple Messaging Platforms (Diagram)............................... 73

Figure 7.3: User Profile Customization Options (UI Mockup).............................................. 74


Abstract
This graduation project introduces Voicebridge, an innovative mobile application
designed to enhance communication accessibility on the popular messaging platform
WhatsApp. Voicebridge leverages the capabilities of server-deployed large language
models (LLMs) to provide seamless, real-time conversion between speech and text for
WhatsApp messages. The application utilizes Android's NotificationListenerService to
capture incoming messages, which are then transmitted to the server for processing by
the LLMs. The converted output, either text-to-speech or speech-to-text, is then relayed
back to the user within the application interface. This project addresses the accessibility
challenges faced by individuals who have difficulty with text-based communication,
offering a hands-free and voice-driven alternative for interacting with WhatsApp. The
development encompasses the design and implementation of the mobile application, the
integration and deployment of suitable LLMs, and the evaluation of the system's
performance in terms of accuracy, latency, and usability.

Chapter 1: Introduction

1.1 Overview

1.1.1 Background and Motivation

1.1.2 Introducing Voicebridge

1.1.3 Core Functionality

1.2 Problem Statement


1.2.1 Accessibility Challenges in Text-Based Communication

1.2.2 Limitations of Existing Solutions

1.2.3 The Need for an Intelligent Voice-Based Solution

1.3 Objectives

1.3.1 Development of a Voicebridge Mobile Application

1.3.2 Development of LLMs for Speech-to-Text and Text-to-Speech

1.3.3 Fine-tuning of LLMs for Optimal Performance

1.3.4 Real-time or Near Real-time Conversion

1.3.5 User-Friendly Interface

1.3.6 Evaluation of System Performance

1.4 Importance

1.4.1 Enhancing Accessibility

1.4.2 Promoting Inclusive Communication

1.4.3 Exploring the Potential of LLMs in Communication Aids

1.4.4 Potential for Future Development

Chapter 2: Background

2.1 WhatsApp and Accessibility


2.1.1 Current Accessibility Features in WhatsApp

2.1.2 Limitations of Current Features

2.2 Large Language Models (LLMs)

2.2.1 Overview of LLMs

2.2.2 LLMs for Speech-to-Text Conversion

2.2.3 LLMs for Text-to-Speech Conversion

2.3 NotificationListenerService (Android)

2.3.1 Functionality and Usage

2.3.2 Permissions and Privacy Considerations

2.4 Server Deployment

2.4.1 Cloud-Based vs. Local Server

2.4.2 API Design for LLM Integration

Chapter 3: Mobile Application Development

3.1 Application Requirements and Specifications

3.1.1 Functional Requirements

3.1.2 Non-Functional Requirements (Performance, Usability, etc.)


3.2 Application Architecture and Design

3.2.1 Application Flow and Navigation

3.2.2 User Interface (UI) Design

3.2.3 Data Management

3.3 Implementation Details

3.3.1 Development Environment and Tools (e.g., Android Studio, Kotlin)

3.3.2 Notification Listener Implementation

3.3.3 Integration with LLM API

3.3.4 Speech Input/Output Handling

3.4 Testing and Debugging

3.4.1 Unit Testing of Application Components

3.4.2 Integration Testing with LLM Server

3.4.3 User Interface Testing

Chapter 4: Machine Learning Models

4.1 Model Selection and Architecture


4.1.1 Speech-to-Text Model Details (e.g., model type, layers)

4.1.2 Text-to-Speech Model Details (e.g., model type, vocoder)

4.1.3 Rationale for Model Choices

4.2 Model Development from Scratch

4.2.1 Data Collection and Preparation

4.2.2 Model Training Process

4.2.3 Training Environment and Resources

4.3 Model Evaluation and Performance Metrics

4.3.1 Speech-to-Text Accuracy Metrics (e.g., WER)

4.3.2 Text-to-Speech Quality Metrics (e.g., MOS)

4.3.3 Model Inference Time and Resource Usage

Chapter 5: Fine-Tuning of Large Language Models

5.1 Rationale for Fine-Tuning

5.1.1 Adapting Models for WhatsApp Communication Style

5.1.2 Improving Accuracy and Contextual Understanding

5.2 Fine-Tuning Data Preparation


5.2.1 Collection of WhatsApp-Specific Data

5.2.2 Data Cleaning and Formatting

5.2.3 Data Augmentation Techniques (if used)

5.3 Fine-Tuning Methodology and Parameters

5.3.1 Fine-Tuning Approach (e.g., transfer learning)

5.3.2 Hyperparameter Selection and Tuning

5.3.3 Evaluation Metrics During Fine-Tuning

5.4 Results and Analysis of Fine-Tuning

5.4.1 Comparison of Performance Before and After Fine-Tuning

5.4.2 Impact of Fine-Tuning on Specific Metrics (Accuracy, Fluency)

5.4.3 Qualitative Analysis of Fine-Tuned Model Output

Chapter 6: Security Considerations

6.1 Privacy of WhatsApp Data

6.1.1 Handling of Notification Data

6.1.2 Anonymization and Pseudonymization Techniques (if applicable)

6.1.3 Compliance with Data Protection Regulations

6.2 Security of Communication Channels


6.2.1 Secure API Communication (e.g., HTTPS)

6.2.2 Encryption of Data in Transit

6.3 Server Security

6.3.1 Access Control and Authentication

6.3.2 Protection Against Unauthorized Access

6.3.3 Regular Security Audits and Updates

6.4 Application Security

6.4.1 Protection Against Reverse Engineering

6.4.2 Secure Data Storage on the Device

6.4.3 Input Validation and Sanitization

Chapter 7: System Integration and Testing

6.1 Integration of Application and LLM Server

6.1.1 API Communication Testing

6.1.2 Data Flow Verification

6.2 End-to-End System Testing


6.2.1 Test Cases for Different Scenarios (e.g., varying message lengths)

6.2.2 Performance Testing (Latency, Throughput)

6.2.3 Usability Testing with Target Users

6.3 Results and Analysis

6.3.1 Overall System Performance Evaluation

6.3.2 Identification of Bottlenecks or Issues

6.3.3 Recommendations for Further Improvement

Chapter 8: Conclusion and Future Work

7.1 Conclusion

7.2 Future Work

7.2.1 Enhanced Language Support

7.2.2 Integration with Other Messaging Platforms

7.2.3 Personalized User Profiles

7.2.4 Offline Functionality

References

Appendices

You might also like