Stuttering Detection (1) (1) - 1
Stuttering Detection (1) (1) - 1
by
LALITHA LAKSHMI D
(REGISTER NO 42734009)
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
NOVEMBER 2023
i
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited "A++" Grade by NAAC | 12B Status by UGC I Approved by AICTE
www.sathyabama.ac.in
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of
LALITHA LKSHMI D (42734009) who carried out the project entitled
“STUTTERING DETECTION” under our supervision from
September 2023 TO November 2023.
Internal Guide
Ms. VINODHINI K, M.Sc., Assistant Professor
ii
DECLARATION
iii
ACKNOWLEDGEMENT
LALITHA LAKSHMI D
iv
ABSTRACT
The aim of this project is to come up with a new algorithm to enhance Speech
recognition for people suffering from stuttering .Stuttering identification is an
interesting domain research problem involving pathology, acoustics, and signal
processing making it hard and complicated to detect. The basic idea is to first
remove shuttering from the sample by using the amplitude threshold obtained
from neural network and passing the clean sample.
It is a disorder which affect the fluency of speech involuntary repetition of words,
syllables, Etc or involuntary silent intervals. Of all these, repetition is the most
common and prominent characteristic of stuttering. The increase usage of
stuttering speech recognition by people who are suffering has led to ease access
in their life
v
TABLE OF CONTENT
vi
TABLE OF FIGURES
vii
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
Stuttering is one manifestation of speech disorders, where the performance of the speech
is not smooth, including pronunciation repetition, prolonged, blocked or stalled at the syllable
or phone level. The incidence rate of stuttering is high. In social groups, about 1% people have
stuttering, and as high as 6% to 6.6% in children have stuttering, about 80% stuttering of which
automatically disappears, while the remaining 20% - that is 1% of the entire population have
difficulties to return to normal and finally become "developmental stuttering". Stuttering
phenomenon can be divided into the following categories:
c) Blocking stuttering: Sentence at the beginning of words may still be barely able to
pronounce. However, when meeting the hard pronunciations, their words are blocked.
d) Ankylosing stuttering: If a few bursts stuttering occurred, they will be nervous, tongue and
just like froze. Even the easy words cannot be pronounced.
e) Difficult pronunciation stuttering: The first pronunciation of every sentence will not be able
to say. Despite they try hard, they can only pronounce in a dull, low "ah", "eh" sound. Among
them, the bursts stuttering and reciprocating stuttering are the most common types of stuttering,
which is one of the main factors affecting speech fluency.
The work involves one dimensional speech signal analysis and a study of factors influencing
audio quality. The flow will be stuttered data collection, signal preprocessing and feature
extraction algorithm, disorder identification and correction using different classification
techniques.
1
STUTTERED SPEECH RECOGNITION: TRADITIONAL MACHINE LEARNING &
DEEP LEARNING BASED APPROACHES
Finally, different classification and clustering methods are used to recognize stutter speech.
Early studies on stuttered speech recognition mainly based on DTW score matching and
traditional machine learning algorithm. Before discussing different approaches to classify
stuttering, we are going to recapitulate some basics of machine learning. Machine learning is
basically a sub part of the broader family of Artificial Intelligence.
2
TYPICAL DISFLUENCY
If there is a breakdown in fluency, then we can say that the resultant speech is disfluent.
Disfluencies occur frequently in typical spontaneous speech, at a rate of around 6 per 100
words (Bortfeld, Leon, Bloom, Schober & Brennan, 2001; Eklund, 2004; Fox Tree, 1995;
Shriberg, 1994). They occur at a higher rate in longer utterances (Oviatt, 1995; Shriberg, 1994)
and in more complex utterances (Lickley, 2001; Shriberg, 1994). Individuals vary considerably
in the rate at which they produce disfluencies, but it is difficult to find a speaker who is never
disfluent. The word ‘disfluency’ is defined in several different ways in the research literature
and there seems to be no consensus on what phenomena it includes, so it is important to begin
this piece with our own operational definition. Our definition of fluency refers to the flow of
speech, so disfluency involves a break in that flow, when the speaker stops for a moment in a
place or for a length of time not predicted by typical fluent production.
3
TITLE 2: SPEECH PROCESSING FOR STUTTER DETECTION
4
TITLE 4: A PROPOSED FRAMEWORK FOR STUTTER DETECTION:
IMPLEMENTATION ON EMBEDDED SYSTEMS
YEAR AND AUTHOR: 2022, Abhijit S. Pandya, Bassem Alhalabi, Harshal
A. Sanghvi and Jonatahn Taylor
DESCRIPTION:
It is estimated that more than 70 million people in the world stutter. One of the major problems
facing speech professionals who collaborate with stuttering patients is quantitatively
monitoring and tracking improvements in and outside of therapy sessions. After extensive
research, it was proposed to develop a bio-medical device that could be worn daily by patients
to monitor and record key events in everyday conversations to track instances of stutters to be
later analyzed by speech professionals. This bio-medical innovation shall assist the health
professionals and caretakers of the stuttering individuals to help them get out of this behavior
and compete in the real world. This paper extensively describes in detail a feasibility study
carried out and prototype developed for such a device and contemplates its future uses and
developments. This biomedical innovation shall provide data regarding various parameters in
stuttering which needs to be evaluated and this evaluation fastens the process of the therapy
provided by health professionals.
YEAR AND AUTHOR: 2013, Bin Dong, Junbo Zhang and Yonghong Yan
DESCRIPTION:
An algorithm to detect Chinese repetitive stuttering by computer is studied. According to
the features of repetitions in Chinese stuttered speech, improvement solutions are provided
based on the previous research findings. First, a multi-span looping forced alignment decoding
networks is designed to detect multi-syllable repetitions in Chinese stuttered speech. Second,
branch penalty factor is added in the networks to adjust decoding trend using recursive search
in order to reduce the error from the complexity of the decoding networks. Finally, we rejudge
the detected stutters by calculating confidence to improve the reliability of the detection result..
5
CHAPTER 2
SYSTEM ANALYSIS
The existing methods for stuttering detection employ spectral features such as mel-
frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs)
or their variants that capture that formant-related information. Other spectral features such as
pitch, zero-crossing rate, shimmer, and spectral spread are also used. Finally, those features
are modeled with statistical modeling methods such as hidden Markov model (HMM), support
vector machine (SVM), Gaussian mixture model (GMM), etc. An alternative strategy of
stuttering detection is to apply ASR on the audio speech signal to get the spoken texts and then
to use language models. Even though this method of detecting stuttering has achieved
encouraging results and has been proven effective, the reliance on ASR makes it
computationally expensive and prone to error.
Characteristic of stuttering
Stuttering is a speech disorder. Usually stuttering has been detected from the age of 18
months to 24 months. It’s mainly the problem of fluency and delivery of speeches in case of
stuttering varies considerably across different speaking situation like diplomatic, official or
presentation mode of conversation and home atmosphere conversation. There are several
reasons of stuttering. Some of those along with types are discussed in this section.
Concentrating on the previous studies and researches some of the following causes of stuttering
are being explained. Based on the literature, some of the common causes of stuttering are
Genetic, Physiological, Congenital, Auditory and Environmental. The brief discussion on the
6
same is presented as follows: & Genetics – On the basis of recent international researches there
are few certain genes identified for the stuttering and the genetic family linkage plays a role.
& Physiological – From the field of brain imaging study a little bit of dis-functioning of brain
during speaking is the reason for lack of speech production and unable to keep the fluency of
saying words or sentences weakness of human neuro system may act in the process. &
Congenital – Congenital factors like physical trauma at the time of birth, cerebral palsy,
retardation may cause the stuttering. The conditions are found in case of sibling and sudden
growth in linguistic ability. & Auditory – Deafness and hard of hearing have an impact on
stuttering. Slow response to audio increases the stuttering habit. & Environmental – An
uncomfortable and stressful situation is a significant reason for development of stuttering
behaviours
7
CHAPTER 3
SYSTEM CONFIGURATION
3.3 Domain
Machine Learning
3.4 PyChamber
8
CHAPTER 4
SYSTEM DESIGN
In this chapter, the various UML diagrams for House Price Prediction System is
represented and the various functionalities are explained.
Use case diagrams are considered for high level requirement analysis of a system. So
when the requirements of a system are analysed the functionalities are captured in use cases.
So it can be said that uses cases are nothing but the system functionalities written in an
organized manner. Now the second things which are relevant to the use cases are the actors.
Actors can be defined as something that interacts with the system. The actors can be human
user, some internal applications or may be some external applications. Use case diagrams are
used to gather the requirements of a system including internal and external influences. These
requirements are mostly design requirements. Hence, when a system is analyzed to gather its
functionalities, use cases are prepared and actors are identified. In the Unified Modeling
Language (UML), a use case diagram can summarize the details of your system's users (also
known as actors) and their interactions with the system. To build one, you'll use a set of
specialized symbols and connectors. An effective use case diagram represents Scenarios in
which your system or application interacts with people, organizations, or external systems,
Goals that your system or application helps those entities (actors) achieve.
9
SPEECH
INPUT
DATASET
TRAIN
SPEECH
PREPROCESSING
FEATURE
EXTRACTION
CLASSIFICATIO
N
Sequence diagrams model the flow of logic within the system in a visual manner,
enabling to both document and validate the logic, and are commonly used for both analysis
and design purposes. Figure 3.2 Sequence diagram of corporate training portal.
10
SPEECH
INPUT
HMM
CONVERSION
DATASET
TRAIN
SPEECH
PREPROCESSING
FEATURE
EXTRACTION
CLASSIFICATION
STUTTERING WORD
DETECTION
The next interaction diagram is collaboration diagram. It shows the object organization.
Here in collaboration diagram the method call sequence is indicated by some numbering
technique. The number indicates how the methods are called one after another. The method
calls are similar to that of a sequence diagram. But the difference is that the sequence diagram
does not describe the object organization whereas the collaboration diagram shows the object
organization.
11
Speech
input
Voice
module
VOICE TO TEXT
Dataset
train
Speech
preprocessing
NLP PREPROCESSING
Feature
extraction
Classification
Stuttering
detection
12
the implementation, allowing for better decision-making about task assignment or needed skill
improvements. System administrators can use component diagrams to plan ahead, using the
view of the logical software components and their relationships on the system. Components
communicate with each other using interfaces. The interfaces are linked using connectors.
STUTTERING DETECTION
STUTTERING WORD
SPEECH INPUT DETECTION
CLASSIFICATI
VOICE
ON
MODULE
DATAS FEATURE
ET EXTRACTI
TRAIN ON
SPEECH
PREPROCES
SING
A deployment diagrams shows the hardware of your system and the software in those
hardware. Deployment diagrams are useful when your software solution is deployed across
multiple machines such as sensor nodes, cluster head and base station with each having a
unique configuration. The Figure 3.6 represents deployment diagram for the developed
application. A deployment diagram is a UML diagram type that shows the execution
architecture of a system, including nodes such as hardware or software execution
environments, and the middleware connecting them. Deployment diagrams are typically used
to visualize the physical hardware and software of a system. Using it you can understand how
13
the system will be physically deployed on the hardware. 25 Deployment diagrams help model
the hardware topology of a system compared to other UML diagrams types which mostly
outline the logical components of a system. A UML deployment diagram is a diagram that
shows the configuration of run time processing nodes and the components that live on them.
Deployment diagrams are a kind of structure diagram used in modeling the physical aspects of
an object-oriented system. They are often be used to model the static deployment view of a
system. Deployment Diagram in the figure 3.6 shows how the modules get deployed in the
system.
Package diagrams are used to reflect the organization of packages and their elements.
When used to represent class elements, package diagrams provide a visualization of the
namespaces. Package diagrams are used to structure high level system elements. Package
diagrams can be used to simplify complex class diagrams, it can group classes into packages.
A package is a collection of logically related UML elements. Packages are depicted as file
folders and can be used on any of the UML diagrams. Package diagrams are structural diagrams
used to show the organization and arrangement of various model elements in the form of
packages. A package is a 26 grouping of related UML elements, such as diagrams, documents,
classes, or even other packages. Each element is nested within the package, which is depicted
as a file folder within the diagram, then arranged hierarchically within the diagram. Package
diagrams are most commonly used to provide a visual organization of the layered architecture
within any UML classifier, such as a software system. The Figure 3.7 represents package
diagram for the developed application which represents how the elements are logically related.
14
a blueprint that you use as a guide, so that you and your colleagues can discuss, improve and
follow.
SPEECH INPUT
VOICE MODULE
DATASET TRAIN
PREPROCESSING
CLASSIFICATIO
N
YES NO
15
4.2.2 Architectural Description
A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system. System architecture can comprise system
components, the externally visible properties of those components, the relationships (e.g. the
behavior) between them. It can provide a plan from which products can be procured, and
systems developed, that will work together to implement the overall system. There have been
efforts to formalize languages to describe system architecture; collectively these are called
architecture description languages (ADLs).
➢ An allocated arrangement of physical elements which provides the design solution for
a consumer product or life-cycle process intended to satisfy the requirements of the
functional architecture and the requirements baseline.
➢ Architecture comprises the most important, pervasive, top-level, strategic inventions,
decisions, and their associated rationales about the overall structure (i.e., essential
elements and their relationships) and associated characteristics and behavior.
➢ If documented, it may include information such as a detailed inventory of current
hardware, software and networking capabilities; a description of long-range plans and
priorities for future purchases, and a plan for upgrading and/or replacing dated
equipment and software.
16
CHAPTER 5
5. SYSTEM IMPLEMENTATION
5.1 MODULES
• SPEECH INPUT
• VOICE MODULE
• DATASET TRAIN
• SPEECH PREPROCESSING
• CLASSIFICATION
• STUTTER WORD DETECTION
1. SPEECH INPUT
For example
2. VOICE MODULE
Voice to text converters have become a necessary tool for individuals and businesses alike.
These tools use speech recognition technology to convert audio files, including voice
commands and speech from video files, into a text transcription.
• Record your voice: Start by recording your voice on a device such as an iPhone or
Android smartphone, or on your Mac or PC. The recorded audio is often saved as a
WAV file, but other formats are typically supported as well.
17
• Choose a transcription tool: Upload the audio recording to a transcription software
or online tool. This could be an app, a desktop program, or a browser-based online
tool. Some of these tools even offer real-time transcription.
• Transcribe audio: The transcription service will convert your audio file to a text file,
often in TXT or DOC format. Many services offer high-quality transcription,
though accuracy can vary. Some tools also allow you to convert speech directly to
text online, without the need for an audio recording.
• Edit the text: After transcription, you can edit the text to ensure it accurately
represents your voice recording. Many tools offer integrated editing
functionality.There are several free speech to text tools you can use.
3. DATASET TRAIN
Training data is labeled data used to teach AI models or machine learning algorithms
to make proper decisions. For example, if you are trying to build a model for a self-driving
car, the training data will include images and videos labeled to identify cars vs street signs
vs people. Here the stutter dataset is analyzed.
18
the same age. FluencyBank. This is a shared database for the study of fluency development
which has been developed by Nan Bernstein Ratner (University of Maryland) and Brian
MacWhinney (Carnegie Mellon University). The platform proposes audio and video files with
transcriptions of adults and children who stutter. The FluencyBank is an interview data of 32
PWS.
4. SPEECH PREPROCESSING
NLP preprocessing is used to preprocess the stutter speech detection. Artificial intelligence has
become part of our everyday lives – Alexa and Siri, text and email autocorrect, customer
service chatbots. They all use machine learning algorithms and Natural Language Processing
(NLP) to process, “understand”, and respond to human language, both written and spoken.
Give this NLP sentiment analyzer a spin to see how NLP automatically understands and
analyzes sentiments in text (Positive, Neutral, Negative). Although NLP and its sister study,
Natural Language Understanding (NLU) are constantly growing in huge leaps and bounds with
their ability to compute words and text, human language is incredibly complex, fluid, and
inconsistent and presents serious challenges that NLP is yet to completely overcome.
Misspelled or misused words can create problems for text analysis. Autocorrect and grammar
correction applications can handle common mistakes, but don’t always understand the writer’s
intention. With spoken language, mispronunciations, different accents, stutters, etc., can be
difficult for a machine to understand. However, as language databases grow and smart
assistants are trained by their individual users, these issues can be minimized.
Keyword Extraction
The final key to the text analysis puzzle, keyword extraction, is a broader form of the
techniques we have already covered. By definition, keyword extraction is the automated
process of extracting the most relevant information from text using AI and machine learning
algorithms. You can mold your software to search for the keywords relevant to your needs –
try it out with our sample keyword extractor.
More technical than our other topics, lemmatization and stemming refers to the breakdown,
tagging, and restructuring of text data based on either root stem or definition. That might
19
seem like saying the same thing twice, but both sorting processes can lend different
valuable data. Discover how to make the best of both techniques in our guide to Text
Cleaning for NLP. That’s a lot to tackle at once, but by understanding each process and
combing through the linked tutorials, you should be well on your way to a smooth and
successful NLP application.
5. CLASSIFICATION
The ANN model is used to detect the stuttered events. The particular stuttered events to be
located are repetitions and prolongations. This is because the repetitions and prolongations
are ubiquitous in stuttered speech. Artificial neural networks are widely applied in the field
of classification. One of the main reasons of the ANN popularity in the area is the fact that,
in contrast to traditional statistical methods, networks adjust data without the necessity of
defining any additional function or distribution of input variables. They are also able to
determine the probability of an element belonging to the group that permits the use of the
ANN application as a posteriori probability estimators for some specified objects.
The stuttered word is detected after classifying the trained and tested dataset of the stutter
detection.
import time
import cv2
import numpy as np
import pyttsx3
import speech_recognition as sr
20
from test import sample
class tk_master:
def __init__(self):
self.master='ar_master'
self.text_color ='#c0c0c0'
self.backround_image='images/background_hd1.jpg'
def get_title(self):
return self.title
def get_titlec(self):
return self.titlec
def get_backround_color(self):
return self.backround_color
def get_text_color(self):
return self.text_color
def get_backround_image(self):
return self.backround_image
def set_window_design(self):
root = Tk()
w = 780
h = 500
ws = root.winfo_screenwidth()
21
hs = root.winfo_screenheight()
x = (ws / 2) - (w / 2)
y = (hs / 2) - (h / 2)
self.bg = ImageTk.PhotoImage(file='images/background_hd1.jpg')
root.title(self.title)
root.resizable(False,False)
bg = ImageTk.PhotoImage(file=self.backround_image)
canvas.pack(fill="both", expand=True)
###
def clickHandler(event):
tt=tk_master
tt.chat_app(event)
image = Image.open('images/admin.png')
my_img = ImageTk.PhotoImage(img)
###
22
CHAPTER 6
SYSTEM TESTING
6.1 METHODOLOGY
1. NLP text processing
Sentence segmentation is the first step in the NLP pipeline. It divides the entire paragraph
into different sentences for better understanding. For example, "London is the capital and
most populous city of England and the United Kingdom. Standing on the River Thames in
the southeast of the island of Great Britain, London has been a major settlement for two
millennia. It was founded by the Romans, who named it Londinium."
“London is the capital and most populous city of England and the United Kingdom.”
“Standing on the River Thames in the southeast of the island of Great Britain, London has
been a major settlement for two millennia.”
23
Step 2: Word tokenization
Word tokenization breaks the sentence into separate words or tokens. This helps understand
the context of the text. When tokenizing the sentence “London is the capital and most
populous city of England and the United Kingdom”, it is broken into separate words, i.e.,
“London”, “is”, “the”, “capital”, “and”, “most”, “populous”, “city”, “of”, “England”,
“and”, “the”, “United”, “Kingdom”, “.”
Step 3: Stemming
Stemming helps in preprocessing text. The model analyzes the parts of speech to figure out
what exactly the sentence is talking about.
Stemming normalizes words into their base or root form. In other words, it helps to predict
the parts of speech for each token. For example, intelligently, intelligence, and intelligent.
These words originate from a single root word ‘intelligen’. However, in English there’s no
such word as ‘intelligen’.
Step 4: Lemmatization
Lemmatization removes inflectional endings and returns the canonical form of a word or
lemma. It is similar to stemming except that the lemma is an actual word. For example,
‘playing’ and ‘plays’ are forms of the word ‘play’. Hence, play is the lemma of these words.
Unlike a stem (recall ‘intelligen’), ‘play’ is a proper word.
The next step is to consider the importance of each and every word in a given sentence. In
English, some words appear more frequently than others such as "is", "a", "the", "and". As they
appear often, the NLP pipeline flags them as stop words. They are filtered out so as to focus
on more important words.
24
Step 6: Dependency parsing
Next comes dependency parsing which is mainly used to find out how all the words in a
sentence are related to each other. To find the dependency, we can build a tree and assign a
single word as a parent word. The main verb in the sentence will act as the root node.
POS tags contain verbs, adverbs, nouns, and adjectives that help indicate the meaning of words
in a grammatically correct way in a sentence.
Voice conversion
The captured speech signals of the speaker contain background noise signals which
come from different sources such as ambient noise, microphone terminal, and the
communication channel. To remove the low background noise signals, the discretized speech
signal of the spoken utterance is passed through a second-order Butterworth IIR (infinite
impulse response) highpass digital filter for the filtering (g[n]*h[n]). The choice for this IIR
filter is based on its flexibility to easily manage signals that are heavily nonlinear in nature due
to its nonlinear phase characteristics and its flexibility of requirements for meeting of
constraints such as arbitrary response. Since most of the energy content of the voice signal is
concentrated within the low frequency range, the filter cut-off frequency is set at 0.40 kHz.
The GMM for speaker recognition is formulated to compute the GMM parameters of extracted
feature vectors of spoken utterances that best match the speech feature templates in the system
database. There are several techniques that may be used to estimate the parameters of the GMM
(mixture weights, mean vector, covariance matrix) that describe the component distribution of
the extracted speech feature vectors. To establish the number of Gaussian distributions useful
for the GMM speaker model for the recognition task, it performed experiments using varying
number of Gaussian distributions and MFCC feature vectors from the utterance “increase
volume upwards” for two speakers. Table 2 shows the results obtained of recognition rates
with varying number of Gaussians. Based on the experimental results, 20 Gaussians was found
adequate for the GMM speaker models.
25
Algorithm used
1. ANN classification algorithm
ANN ALGORITHM USING IN SR CLASSIFICATION The neural network (NN) model
uses state transitions, association strengths, and functions whereas Markov chains are serial
and parallel in a key distinction. Artificial neurons are the fundamental unit of ANN
consists simple processors known as neurons simulate the performance of biological nerve
cell that conceived as a model of the natural neuron. The output signal is directed through
the neuron's outgoing link, which is split into many divisions to send the same signal. The
incoming links with other neurons in the network terminates the outgoing divisions.
ANN Architecture
ANN is a reasoning prototypical consists a data unit of neuron which is created on the
human brain to perform its functions. NN consists a single neuron with variable synaptic
weights comprises of a linear alliance.
26
CHAPTER 7
CONCLUSION
Speech is the communication carrier to express human thoughts, feelings and ideas.
Stuttering, or stammering is a disorder of speech which affects millions of people in the glove.
In the field of stuttered speech recognition, different machine learning models were applied for
analysis and classification over the last few decades. In this study, different machine learning
and deep learning models with their application in stuttered speech recognition are discussed.
The major classifiers ANN have been used to classify different types of stutterers. Deep
learning algorithms have become very popular nowadays over traditional machine learning
algorithms for stuttering speech recognition, discussed briefly in this study. Deep neural
network can be employed to classify different types of stuttering with better accuracy. There
are very few researches on removing of stuttering of different types from a speech signal.
Identification of stuttering is required but main focused should be on removal of stuttering.
Interjection, prolongation type of stuttering and unvoiced speech can be removed by different
ways.
27
CHAPTER 8
FUTURE ENHANCEMENT
In future the stutter speech detection csn be done like the AI technology assistant
who can assist the people with such disorder with correct voice module like alexa
and siri.
28
CHAPTER 9
APPENDIX
9.1 CODING
Same fragment duration with same no of fluent speech were implemented in the network at
two stages for 4-s fragments of 40 number having blockades in pronunciation in words stating
with the consonants (p, b, t, d, k and g) and repetition of 1 to 11 stop consonants. For
decreasing the dimension of the input signal, Kohonen network consisting of 21 input neurons
and 25 output
neurons was used first. Then multilayer perceptron including 171 input neurons, 53 hidden
layers and one output layer was examined to achieve 96.67% classification accuracy. In order
to identify dysfluencies in stuttered speech of children, a two-stage technique was used to build
an automatic recognition method. In the first stage, speech was segmented in words and words
were classified as fluent or disfluent using ANN classifier.
MAIN
import time
import cv2
import numpy as np
import pyttsx3
import speech_recognition as sr
29
class tk_master:
def __init__(self):
self.master='ar_master'
self.text_color ='#c0c0c0'
self.backround_image='images/background_hd1.jpg'
def get_title(self):
return self.title
def get_titlec(self):
return self.titlec
def get_backround_color(self):
return self.backround_color
def get_text_color(self):
return self.text_color
def get_backround_image(self):
return self.backround_image
def set_window_design(self):
root = Tk()
w = 780
h = 500
ws = root.winfo_screenwidth()
hs = root.winfo_screenheight()
x = (ws / 2) - (w / 2)
30
y = (hs / 2) - (h / 2)
self.bg = ImageTk.PhotoImage(file='images/background_hd1.jpg')
root.title(self.title)
root.resizable(False,False)
bg = ImageTk.PhotoImage(file=self.backround_image)
canvas.pack(fill="both", expand=True)
###
def clickHandler(event):
tt=tk_master
tt.chat_app(event)
image = Image.open('images/admin.png')
my_img = ImageTk.PhotoImage(img)
###
###
31
root.mainloop()
def chat_app(self):
chat_app_root =Toplevel()
get_data = sample()
w = 780
h = 500
ws = chat_app_root.winfo_screenwidth()
hs = chat_app_root.winfo_screenheight()
x = (ws / 2) - (w / 2)
y = (hs / 2) - (h / 2)
chat_app_root.title(get_data.get_title())
chat_app_root.resizable(False, False)
bg = ImageTk.PhotoImage(file='images/background_hd1.jpg')
canvas.pack(fill="both", expand=True)
fill=get_data.get_text_color())
fill=get_data.get_text_color())
32
fill=get_data.get_text_color())
fill=get_data.get_text_color())
# fill=get_data.get_text_color())
# fill=get_data.get_text_color())
fill=get_data.get_text_color())
global e1,tree
global w1, e2
w1 = StringVar()
def SpeakText(command):
engine = pyttsx3.init()
33
engine.say(command)
engine.runAndWait()
engine.stop()
def SpeakText1(command):
engine = pyttsx3.init()
engine.say(command)
engine.runAndWait()
engine.stop()
def voice_input():
try:
canvas.update()
canvas.itemconfig(status_id, text='Loading...')
text = ''
if 1 == 1:
# e1.delete(0, END)
r = sr.Recognizer()
text = r.recognize_google(audio_data)
previous_data = w1.get()
# print(text)
canvas.update()
time.sleep(1)
SpeakText1(text)
34
canvas.itemconfig(status_id, text='Recognizing...')
canvas.update()
time.sleep(1)
except:
SpeakText1(msg)
# voice_input()
def exit_program():
voice_input()
def test():
get_data = tk_master()
# w1.set(user)
# SpeakText(user)
# SpeakText(first_text)
# canvas.itemconfig(admin_user, text=user)
canvas.itemconfig(status_id, text='Loading...')
canvas.update_idletasks()
engine = pyttsx3.init()
engine.say(user)
engine.runAndWait()
engine.stop()
chat_app_root.after_idle(test)
35
chat_app_root.mainloop()
ar=tk_master()
root=ar.set_window_design()
TEST
import threading
import time
import pyttsx3
import speech_recognition as sr
class sample:
def __init__(self):
self.master='ar_master'
self.text_color ='#c0c0c0'
self.backround_image='images/background_hd1.jpg'
def get_title(self):
return self.title
def get_titlec(self):
return self.titlec
def get_backround_color(self):
36
return self.backround_color
def get_text_color(self):
return self.text_color
def get_backround_image(self):
return self.backround_image
import glob
import os
import shutil
import sys
import sysconfig
try:
except:
import winreg
import tempfile
class Tee:
37
self.f = file
try:
self.f.write(what.replace("\n", "\r\n"))
except IOError:
pass
tee_f.write(what)
def flush(self):
try:
self.f.flush()
except IOError:
pass
tee_f.flush()
# For some unknown reason, when running under bdist_wininst we will start up
# with sys.stdout as None but stderr is hooked up. This work-around allows
# the install.
if sys.stdout is None:
sys.stdout = sys.stderr
sys.stderr = Tee(sys.stderr)
sys.stdout = Tee(sys.stdout)
com_modules = [
# module_name, class_names
38
("win32com.servers.interp", "Interpreter"),
("win32com.servers.dictionary", "DictionaryPolicy"),
("win32com.axscript.client.pyscript", "PyScript"),
silent = 0
verbose = 1
try:
file_created
is_bdist_wininst = True
except NameError:
def file_created(file):
pass
def directory_created(directory):
pass
39
def get_root_hkey():
try:
winreg.OpenKey(
return winreg.HKEY_LOCAL_MACHINE
except OSError:
# must be HKCU
return winreg.HKEY_CURRENT_USER
try:
create_shortcut
except NameError:
# by bdist_wininst
def create_shortcut(
):
import pythoncom
ilink = pythoncom.CoCreateInstance(
shell.CLSID_ShellLink,
None,
pythoncom.CLSCTX_INPROC_SERVER,
shell.IID_IShellLink,
40
)
ilink.SetPath(path)
ilink.SetDescription(description)
if arguments:
ilink.SetArguments(arguments)
if workdir:
ilink.SetWorkingDirectory(workdir)
if iconpath or iconindex:
ilink.SetIconLocation(iconpath, iconindex)
ipf = ilink.QueryInterface(pythoncom.IID_IPersistFile)
ipf.Save(filename, 0)
def get_special_folder_path(path_name):
CSIDL_LOCAL_APPDATA CSIDL_APPDATA
CSIDL_COMMON_DESKTOPDIRECTORY
CSIDL_COMMON_PROGRAMS CSIDL_PROGRAMS
CSIDL_PROGRAM_FILES_COMMON
CSIDL_PROGRAM_FILES CSIDL_FONTS""".split():
if maybe == path_name:
41
raise ValueError("%s is an unknown path ID" % (path_name,))
import win32api
import win32con
while 1:
try:
win32api.CopyFile(src, dest, 0)
return
raise
if silent:
raise
full_desc = (
"Error %s\n\n"
% (desc, details.strerror)
rc = win32api.MessageBox(
if rc == win32con.IDABORT:
raise
42
elifrc == win32con.IDIGNORE:
return
# so we can copy our system files there - but importing win32api will
# So, we pull the same trick pywintypes.py does, but it loads from
import importlib.machinery
import importlib.util
filename = "%s%d%d%s.dll" % (
modname,
sys.version_info[0],
sys.version_info[1],
suffix,
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
43
def SetPyKeyVal(key_name, value_name, value):
root_hkey = get_root_hkey()
try:
try:
if verbose:
finally:
my_key.Close()
finally:
root_key.Close()
root_hkey = get_root_hkey()
try:
try:
winreg.DeleteValue(my_key, value_name)
if verbose:
finally:
my_key.Close()
if delete_key:
44
winreg.DeleteKey(root_key, key_name)
if verbose:
raise
finally:
root_key.Close()
def RegisterCOMObjects(register=True):
import win32com.server.register
if register:
func = win32com.server.register.RegisterClasses
else:
func = win32com.server.register.UnregisterClasses
flags = {}
if not verbose:
flags["quiet"] = 1
__import__(module)
mod = sys.modules[module]
func(klass, **flags)
45
def RegisterHelpFile(register=True, lib_dir=None):
if lib_dir is None:
lib_dir = sysconfig.get_paths()["platlib"]
if register:
if os.path.isfile(chm_file):
return chm_file
else:
print("NOTE: PyWin32.chm can not be located, so has not " "been registered")
else:
return None
??? Should probably also add Edit command for pys files also.
"""
import os
if lib_dir is None:
lib_dir = sysconfig.get_paths()["platlib"]
46
classes_root = get_root_hkey()
## Installer executable doesn't seem to pass anything to postinstall script indicating if it's a
debug build,
keys_vals = [
"Software\\Microsoft\\Windows\\CurrentVersion\\App Paths\\Pythonwin.exe",
"",
pythonwin_exe,
),
"command",
pythonwin_edit_command,
),
"command",
pythonwin_edit_command,
),
try:
if register:
47
## Since winreg only uses the character Api functions, this can fail if Python
if sub_key:
hkey.Close()
else:
try:
if sub_key:
winreg.DeleteKey(hkey, sub_key)
hkey.Close()
winreg.DeleteKey(classes_root, key)
raise
finally:
shell.SHChangeNotify(
48
def get_shortcuts_folder():
if get_root_hkey() == winreg.HKEY_LOCAL_MACHINE:
try:
fldr = get_special_folder_path("CSIDL_COMMON_PROGRAMS")
except OSError:
fldr = get_special_folder_path("CSIDL_PROGRAMS")
else:
fldr = get_special_folder_path("CSIDL_PROGRAMS")
try:
install_group = winreg.QueryValue(
except OSError:
vi = sys.version_info
# Get the system directory, which may be the Wow64 directory if we are a 32bit
def get_system_dir():
try:
import pythoncom
import win32process
49
from win32com.shell import shell, shellcon
try:
if win32process.IsWow64Process():
return win32api.GetSystemDirectory()
except ImportError:
return win32api.GetSystemDirectory()
def fixup_dbi():
# We used to have a dbi.pyd with our .pyd files, but now have a .py file.
# If the user didn't uninstall, they will find the .pyd which will cause
import win32api
import win32con
try:
if os.path.isfile(this_dest):
print(
50
% (this_dest, this_pyd)
os.remove(this_pyd)
else:
os.rename(this_pyd, this_dest)
file_created(this_pyd + ".old")
def install(lib_dir):
import traceback
# Create the .pth file in the site-packages dir, and use only relative paths
if os.path.isfile(os.path.join(sys.prefix, "pywin32.pth")):
os.unlink(os.path.join(sys.prefix, "pywin32.pth"))
# The .pth may be new and therefore not loaded in this session.
sys.path.append(os.path.join(lib_dir, name))
51
for root in winreg.HKEY_LOCAL_MACHINE, winreg.HKEY_CURRENT_USER:
try:
except WindowsError:
pass
try:
winreg.DeleteKey(root, keyname)
except WindowsError:
pass
LoadSystemModule(lib_dir, "pywintypes")
LoadSystemModule(lib_dir, "pythoncom")
import win32api
if not files:
# Try the system32 directory first - if that fails due to "access denied",
worked = 0
try:
base = os.path.basename(fname)
52
CopyTo("installing %s" % base, fname, dst)
if verbose:
file_created(dst)
worked = 1
bad_dest_dirs = [
os.path.join(sys.prefix, "Library\\bin"),
os.path.join(sys.prefix, "Lib\\site-packages\\win32"),
if dest_dir != sys.prefix:
bad_dest_dirs.append(sys.prefix)
if os.path.exists(bad_fname):
os.unlink(bad_fname)
if worked:
break
if details.winerror == 5:
53
# in that place - otherwise that one will still get used!
if os.path.exists(dst):
msg = (
print(msg)
raise RuntimeError(msg)
continue
raise
else:
raise RuntimeError(
try:
try:
RegisterCOMObjects()
if details.winerror != 5: # ERROR_ACCESS_DENIED
54
raise
except Exception:
traceback.print_exc()
# There may be no main Python key in HKCU if, eg, an admin installed
# python itself.
winreg.CreateKey(get_root_hkey(), root_key_name)
chm_file = None
try:
except Exception:
traceback.print_exc()
else:
if verbose:
fixup_dbi()
try:
RegisterPythonwin(True, lib_dir)
except Exception:
55
traceback.print_exc()
else:
if verbose:
if not os.path.isdir(make_dir):
if verbose:
directory_created(make_dir)
os.mkdir(make_dir)
try:
# create shortcuts
fldr = get_shortcuts_folder()
if os.path.isdir(fldr):
create_shortcut(
os.path.join(lib_dir, "Pythonwin\\Pythonwin.exe"),
dst,
"",
56
sys.prefix,
file_created(dst)
if verbose:
if chm_file:
file_created(dst)
if verbose:
else:
if verbose:
print(details)
try:
except ImportError:
pass
57
print("The pywin32 extensions were successfully installed.")
if is_bdist_wininst:
# Open a web page with info about the .exe installers being deprecated.
import webbrowser
try:
webbrowser.open("https://mhammond.github.io/pywin32_installers.html")
except webbrowser.Error:
def uninstall(lib_dir):
LoadSystemModule(lib_dir, "pywintypes")
LoadSystemModule(lib_dir, "pythoncom")
try:
RegisterCOMObjects(False)
try:
RegisterHelpFile(False, lib_dir)
else:
if verbose:
try:
58
RegisterPythonwin(False, lib_dir)
else:
if verbose:
print("Unregistered Pythonwin")
try:
# removegen_py directory.
if os.path.isdir(gen_dir):
shutil.rmtree(gen_dir)
if verbose:
os.remove(fname)
try:
except os.error:
pass
try:
except os.error:
59
pass
try:
fldr = get_shortcuts_folder()
if os.path.isfile(fqlink):
os.remove(fqlink)
if verbose:
# Try the system32 directory first - if that fails due to "access denied",
try:
worked = 0
base = os.path.basename(fname)
if os.path.isfile(dst):
60
try:
os.remove(dst)
worked = 1
if verbose:
except Exception:
if worked:
break
# '-install' or '-remove'.
# Important: From inside the binary installer this script MUST NOT
# but also the installer will terminate! (Is there a way to prevent
def verify_destination(location):
if not os.path.isdir(location):
return location
def main():
import argparse
parser = argparse.ArgumentParser(
61
formatter_class=argparse.RawDescriptionHelpFormatter,
* Typical usage:
If you installed pywin32 via PIP, you almost certainly need to run this to
is setup correctly.
""",
parser.add_argument(
"-install",
default=False,
action="store_true",
parser.add_argument(
"-remove",
default=False,
action="store_true",
parser.add_argument(
62
"-wait",
type=int,
parser.add_argument(
"-silent",
default=False,
action="store_true",
parser.add_argument(
"-quiet",
default=False,
action="store_true",
parser.add_argument(
"-destination",
default=sysconfig.get_paths()["platlib"],
type=verify_destination,
args = parser.parse_args()
if not args.quiet:
63
if not args.install ^ args.remove:
try:
os.waitpid(args.wait, 0)
except os.error:
pass
silent = args.silent
if args.install:
install(args.destination)
if args.remove:
if not is_bdist_wininst:
uninstall(args.destination)
if __name__ == "__main__":
main()
import os
import site
import subprocess
import sys
# locate the dirs based on where this script is - it may be either in the
this_dir = os.path.dirname(__file__)
64
site_packages = [
site.getusersitepackages(),
] + site.getsitepackages()
failures = []
sys.stdout.flush()
sys.stdout.flush()
if result.returncode:
failures.append(script)
if os.path.isfile(maybe):
run_test(maybe, extras)
break
else:
raise RuntimeError(
65
"Failed to locate a test script in one of %s" % possible_locations
def main():
import argparse
parser = argparse.ArgumentParser(
parser.add_argument(
"-no-user-interaction",
default=False,
action="store_true",
parser.add_argument(
"-user-interaction",
action="store_true",
parser.add_argument(
"-skip-adodbapi",
default=False,
action="store_true",
66
args, remains = parser.parse_known_args()
extras = []
if args.user_interaction:
extras += ["-user-interaction"]
extras.extend(remains)
scripts = [
"win32/test/testall.py",
"Pythonwin/pywin/test/all.py",
find_and_run(maybes, extras)
# win32com
maybes = [
for directory in [
os.path.join(this_dir, "com"),
+ site_packages
find_and_run(maybes, extras)
# adodbapi
if not args.skip_adodbapi:
67
maybes = [
find_and_run(maybes, remains)
# This script has a hard-coded sql server name in it, (and markh typically
# doesn't have a different server to test on) but there is now supposed to be a server out
there on the Internet
maybes = [
find_and_run(maybes, remains)
if failures:
print(">", failure)
sys.exit(1)
if __name__ == "__main__":
main()
# s=sample()
# s.chat_window()
68
9.2 SCREEN SHOT
70
CHAPTER 10
REFERENCES
[1] Van Riper C. The Nature of Stuttering. New Jersey: Prentice Hall, 1971
[2] Sichang Jiang, Rui Gu. Speech Language Patholody. Beijing, 2005
[3] Wingate M E, Howell P. Foundations of Stuttering. The Journal of the Acoustical Society
of America, 2002; 112: 1229-1231
[4] Qingli Zang. The evaluation and treatment of speech-language disorders. Shijiazhuang,
1991: 185-199
[5] Bergl P,Cmejla R, Prague. Change Detection with Applications to Speech Analysis. bid(n-
i), 2005; 1: 1-4
[7] Noth E, Niemann H, Haderlein T, et al. Automatic stuttering recognition using hidden
Markov models. Sixth International Conference on Spoken Language Processing, 2000.
[8] Witt S M, Young S J. Phone-level pronunciation scoring and, assessment for interactive
language learning. Speech Communication, 2000; 30: 95-108
[10] L. Iverach et al., “Comparison of adults who stutter with and without social anxiety
disorder,” J. Fluen. Disord., vol. 56, pp. 55–68, Jun. 2018, doi: 10.1016/j.jfludis.2018.03.001.
71