0% found this document useful (0 votes)
8 views19 pages

Chapter 2

Chapter 2 covers signal analysis in the context of artificial intelligence, focusing on audio processing techniques such as Pulse Code Modulation (PCM), time framing, Fourier Transform, and Mel-frequency Cepstral Coefficients (MFCC). It explains the conversion of analog audio to digital form, the segmentation of audio signals into frames for analysis, and the mathematical tools used to analyze frequency content. Additionally, it discusses the applications of these techniques in fields like audio analysis, speech processing, and machine learning.

Uploaded by

4gvq4qkyqq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Chapter 2

Chapter 2 covers signal analysis in the context of artificial intelligence, focusing on audio processing techniques such as Pulse Code Modulation (PCM), time framing, Fourier Transform, and Mel-frequency Cepstral Coefficients (MFCC). It explains the conversion of analog audio to digital form, the segmentation of audio signals into frames for analysis, and the mathematical tools used to analyze frequency content. Additionally, it discusses the applications of these techniques in fields like audio analysis, speech processing, and machine learning.

Uploaded by

4gvq4qkyqq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter-2

Signals analysis
Selected topics in Artificial Intelligence
Summer semester -2025
Content
• Raw Data and PCM
• Time Framing
• Fourier Transform
• What is a Spectrogram
• Feature Extraction – MFCC
1 Raw Data and PCM

• How Audio is Digitized?


Before analyzing a signal in the
frequency domain, it must first
be converted from analog to
digital form. This process
involves sampling and quantiz
ation, typically implemented
using Pulse Code Modulation.
1.2 What is Pulse Code Modulation (PCM)?

• Pulse Code Modulation (PCM) is the standard method for


converting analog audio signals into digital form.
It involves three steps:
1. Sampling the signal at regular time intervals,
2. quantizing each sample to the nearest discrete level,
3. and encoding the result into a binary stream (e.g., 010110...).
The final output is a digital signal that can be stored,
transmitted, or analyzed by computers.
2. Time Framing
In audio processing, time framing involves segmenting a
continuous audio signal into smaller, overlapping or non-
overlapping blocks called frames. These frames are then
analyzed individually, allowing for processing of time-
varying characteristics of the audio. A typical frame
duration is 20-40 milliseconds, with a frame hop (the
amount of time advanced between frames) often around
10 milliseconds.
2.1 Steps in Time Framing and Preprocessing for Spectral Analysis

• Signal division: The audio signal, which is a continuous


waveform, is divided into frames.
• Frame length and hop: Each frame has a specific length (e.g., 25
milliseconds) and a frame hop (e.g., 10 milliseconds). The hop
determines how much the frame advances from one to the next.
Overlapping frames are common to capture transitions more
smoothly.
• Windowing: A window function is often applied to each frame to
reduce discontinuities at the edges of the frame and minimize
spectral leakage when performing Fourier analysis.
Tutorial for frame blocking

• A signal is sampled at 12KHz, the frame size is chosen to be 20


ms, and adjacent frames are separated by 5ms.
Calculate N and m.

(ans: N=240, m = 6 0 . )
• Repeat above when adjacent frames do not
overlap.
(ans: N=240, m=240.)
Frame Length (in samples)

• Frame Length (in samples)

• N= Sampling Rate × Frame Duration (in seconds)


• N: Number of samples in one frame
• Sampling Rate: Samples per second (e.g., 22,000
Hz)
• Frame Duration: Frame size in seconds (e.g., 15
ms = 0.015 s)
Frame Overlap (in samples)
• Overlap=Overlap Ratio×N

• Overlap Ratio: Fraction of the frame that


overlaps with the next (e.g., 0.40 for 40%)
• N: Frame length in samples
Frame Hop (Step Size)
m=N−Overlap

• m: Step size – how far we move forward to start


the next frame
• NN: Frame length
• Overlap: Number of overlapping samples between
frames
Class exercise
For a 22-KHz/16-bit sampling speech wave, the frame size is 15 ms and
the frame overlapping. The period is 40% of the frame size.
Calculate N and m.
Answer: Number of samples in one frame (N) = 15 ms *(1422k)=330
Overlapping samples = 132, m=N-132=198.000
Overlapping time = 132 * (1/22k)=6ms;
Time in one frame= 330* (1/22k)=15ms.
3. Fourier Transform
The Fourier Transform is a mathematical tool that decomposes a
function (often a signal) into its constituent frequencies. It takes
information from the "time domain" or "spatial domain" and
transforms it into the "frequency domain,". This transformation is
widely used in various fields like signal processing, image analysis,
and physics.
Applications of Fourier Transform :

• Signal Processing: In audio processing, it helps identify


the frequencies present in a sound, allowing for filtering
or manipulation of specific frequencies. In radio and
telecommunications, it's used to analyze and process
signals.

• Image Analysis: In image processing, it can reveal the


spatial frequencies or patterns within an image, helping
with tasks like edge detection or noise reduction.

• Physics
What is a Spectrogram?
Definition:
• A spectrogram is a visual representation of how
the frequency content of a signal changes over time.
It is a powerful tool in audio signal processing, particularly
for analyzing speech and music.
Spectrogram Axes:
• X-axis (Horizontal): Time
• Y-axis (Vertical): Frequency
• Color/Intensity: Represents the power
(amplitude) or energy of the signal at a given time and
frequency
spectral envelope

• The spectral envelope is the general shape of the frequency


spectrum at a given time.
• By observing how this shape evolves, we can understand how the
characteristics of the sound (e.g., pitch, formants) change over
time.
Applications:

Spectrograms are widely used in various fields, including:


1. Audio analysis: Identifying different sounds (speech, music,
etc.) and their characteristics.
2. Speech processing: Analyzing vocal sounds, understanding
speech patterns, and developing speech recognition systems.
3. Machine learning: Extracting features from audio signals for
tasks like audio classification and speaker recognition.
4. Feature Extraction – MFCC

MFCC stands for Mel-frequency Cepstral Coefficients. It’s a feature


used in automatic speech and speaker recognition.
MFCCs are mathematical representations of the vocal tract
produced by humans as they speak. The process involves several
steps to capture the essential characteristics of human speech,
which are most discernible to the human ear.
How to compute MFCC?
To calculate MFCCs, we follow these steps:
• Pre-emphasize the signal: Amplify higher frequencies to balance the
spectrum.
• Framing: Break the signal into small, overlapping frames.
• Windowing: To soften the edges of each frame, apply a Hamming
window.
• FFT: Convert each frame from the time domain to the frequency
domain.
• Mel-filter bank: Apply overlapping triangular filters spaced according
to the Mel scale.
• Logarithm: To replicate the way a human ear reacts to sound strength,
take the logarithm of the filter bank outputs.
• DCT: Apply the DCT to the log Mel-spectrum to obtain the Mel-
frequency Cepstral Coefficients.

You might also like