0% found this document useful (0 votes)
9 views101 pages

Unit 1

The document provides an overview of various text processing techniques essential for natural language processing (NLP), including text cleaning, tokenization, word tagging, and ensemble learning. It details methods for improving model accuracy through preprocessing, such as removing noise and normalizing text, as well as advanced techniques like sequential tagging and ensemble methods like Random Forests. Additionally, it covers the process of text classification and sentiment analysis, highlighting tools and libraries used in these tasks.

Uploaded by

prabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views101 pages

Unit 1

The document provides an overview of various text processing techniques essential for natural language processing (NLP), including text cleaning, tokenization, word tagging, and ensemble learning. It details methods for improving model accuracy through preprocessing, such as removing noise and normalizing text, as well as advanced techniques like sequential tagging and ensemble methods like Random Forests. Additionally, it covers the process of text classification and sentiment analysis, highlighting tools and libraries used in these tasks.

Uploaded by

prabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 101

Slide 1: Cleaning Text Data – Introduction

Title: Why Clean Text Data?


 Raw text data is noisy and unstructured.
 Essential for improving model accuracy.
 Preprocessing is the foundation of all NLP tasks.

🔹 Slide 2: Common Issues in Raw Text


Title: What Needs to be Cleaned?
 Punctuation marks (e.g., !?., etc.)
 Numbers and symbols (123, $, %, &)
 Case inconsistencies (e.g., "Text" vs "text")
 Extra whitespaces or line breaks

🔹 Slide 3: Lowercasing Text


Title: Text Normalization – Lowercasing
 Convert all characters to lowercase.
 Helps reduce dimensionality.
 Example: “NLP is Fun” → “nlp is fun”

🔹 Slide 4: Removing Stopwords


Title: Stopword Removal
 Stopwords: common words with little semantic value.
 Examples: “is”, “the”, “and”, “of”, “in”
 Use libraries: nltk.corpus.stopwords, spaCy
 Can improve signal-to-noise ratio.

🔹 Slide 5: Removing Special Characters


Title: Remove Punctuation & Symbols
 Use regex to strip characters like !@#$%^&*()
 Example:
o Input: “Hi! How are you? :)”
o Output: “Hi How are you”

🔹 Slide 6: Handling Contractions


Title: Expand Contractions
 Replace short forms with full words:
o “don’t” → “do not”
o “it’s” → “it is”
 Use packages like contractions in Python.

🔹 Slide 7: Spelling Correction


Title: Correcting Spelling Errors
 Important for user-generated content (social media, reviews)
 Tools: TextBlob, SymSpell, Hunspell
 Example: “recieve” → “receive”

🔹 Slide 8: Removing HTML, URLs, Digits


Title: Clean Web-Sourced Text
 Remove HTML tags: <div>, <p>, etc.
 Strip URLs and emails: http://, example@gmail.com
 Remove or replace numbers

🔹 Slide 9: Final Cleaned Text Example


Title: Before and After Cleaning
Raw Input:
plaintext
CopyEdit
"Hey there! I can't believe it’s only $5. Visit: www.example.com"
Cleaned Output:
plaintext
CopyEdit
"hey believe only visit"

Slide 1: What is Tokenization?


 Splitting text into smaller units called tokens
 Tokens can be words, characters, or subwords
 First step in most NLP pipelines

Slide 2: Why Tokenize Text?


 Converts unstructured text into structured format
 Prepares data for analysis or modeling
 Essential for feature extraction

Slide 3: Major Tokenization Techniques


 Word Tokenization: "I love AI" → [I, love, AI]
 Sentence Tokenization: Based on punctuation
 Character Tokenization: "data" → [d, a, t, a]
 Subword Tokenization: Used in BERT

Slide 4: Word Tokenization – Example


 Input: “Text processing is essential.”
 Output: [“Text”, “processing”, “is”, “essential”]
 Tools: nltk.word_tokenize(), spaCy

Slide 5: Sentence Tokenization – Example


 Input: “This is NLP. It’s very useful.”
 Output: [“This is NLP.”, “It’s very useful.”]
 Tools: nltk.sent_tokenize(), spaCy

Slide 6: Subword Tokenization


 Splits uncommon words into parts
 Handles out-of-vocabulary (OOV) words
 Example: “playing” → [“play”, “##ing”] in BERT

Slide 7: Tokenization Tools


 NLTK – Classical NLP toolkit
 spaCy – Fast and efficient
 HuggingFace Transformers – For BERT, GPT
 Regex – Custom tokenization rules

Slide 8: Tokenization Challenges


 Punctuation: “hello!” vs “hello”
 Contractions: “don’t” → “don’t” or “do” + “n’t”
 Emojis, hashtags, and URLs in social text

Slide 9: Summary: Tokenization Essentials


 Converts raw text into structured tokens
 Choose tokenizer based on task and language
 Supports downstream tasks like classification
Slide 1: Introduction to Word Tagging
Title: What is Tagging in NLP?
Content:
 Assigning labels to words based on their role in a sentence
 Common tagging task: Part-of-Speech (POS) tagging
 Helps understand the structure and meaning of text

✅ Slide 2: What is Part-of-Speech (POS) Tagging?


Title: POS Tagging Explained
Content:
 Identifies grammatical role of each word
 Tags like Noun, Verb, Adjective, etc.
 Example:
o Sentence: “The dog barked loudly.”
o Tags: Det Noun Verb Adverb

✅ Slide 3: Common POS Tags


Title: Common POS Tags
Content:
 NN: Noun (dog, car)
 VB: Verb (run, play)
 JJ: Adjective (blue, tall)
 RB: Adverb (quickly, loudly)
 DT: Determiner (the, a)
 IN: Preposition (in, on)
✅ Slide 4: Tagging Techniques
Title: POS Tagging Techniques
Content:
 Rule-based Taggers – Manually defined grammar rules
 Statistical Taggers – Use probability (e.g., HMM)
 Machine Learning Taggers – CRF, Bi-LSTM, Transformers

✅ Slide 5: Tools for POS Tagging


Title: Libraries for Tagging
Content:
 NLTK: nltk.pos_tag()
 spaCy: doc[i].pos_, doc[i].tag_
 Stanza, Flair, Transformers (BERT models)

✅ Slide 6: Categorizing Words


Title: Beyond POS – Categorizing Words
Content:
 Semantic categories: Names, Locations, Numbers
 Morphological info: Tense, Number, Gender
 Syntactic roles: Subject, Object, Modifier

✅ Slide 7: Applications of Word Tagging


Title: Why Tag Words?
Content:
 Grammar checking
 Named Entity Recognition (NER)
 Information extraction
 Dependency parsing and question answering
✅ Slide 8: Example – Tagged Sentence
Title: Example Output
Content:
 Input Sentence: “She sells sea shells on the seashore.”
 Tagged Output:
o She/PRP sells/VBZ sea/NN shells/NNS on/IN the/DT seashore/NN

✅ Slide 9: Summary
Title: Summary: Tagging & Categorization
Content:
 POS tagging is crucial for syntactic analysis
 Helps extract grammar and meaning from text
 Tools: NLTK, spaCy, BERT
 Used in NER, parsing, and classification tasks
Slide 1: Introduction
Title: What is Sequential Tagging?
Content:
 Tags words based on their context within a sentence
 Useful for tasks like POS tagging, NER, and chunking
 Handles dependencies: "Time flies like an arrow."

✅ Slide 2: Why Use Sequential Tagging?


Title: Importance of Sequential Models
Content:
 Single-word taggers ignore context
 Sequential models predict a sequence of labels
 Better handling of ambiguous words
 Example:
o “bear” in “I saw a bear” (noun) vs “They bear gifts” (verb)

✅ Slide 3: Techniques for Sequential Tagging


Title: Common Algorithms
Content:
 Hidden Markov Models (HMMs) – Probabilistic sequence models
 Conditional Random Fields (CRFs) – Discriminative models
 Recurrent Neural Networks (RNNs) – LSTM, BiLSTM
 Transformers (e.g., BERT) – Attention-based context modeling

✅ Slide 4: Backoff Tagging – Concept


Title: What is Backoff Tagging?
Content:
 A fallback strategy used when primary tagger fails
 Combines multiple taggers in sequence
 Uses simplest model if others can’t predict

✅ Slide 5: Tagger Cascade Example


Title: Tagger Backoff Pipeline
Content:
1. DefaultTagger → assigns most frequent tag (e.g., NN)
2. UnigramTagger → uses frequency of individual words
3. BigramTagger → uses context (previous tag + word)
 If one fails, it backs off to the next

✅ Slide 6: Code Example – NLTK


Title: Backoff Tagging in NLTK (Python)
Content:
python
CopyEdit
from nltk.tag import DefaultTagger, UnigramTagger, BigramTagger
from nltk.corpus import treebank

train_data = treebank.tagged_sents()[:3000]

default = DefaultTagger('NN')
unigram = UnigramTagger(train_data, backoff=default)
bigram = BigramTagger(train_data, backoff=unigram)

print(bigram.tag(['The', 'dog', 'barked']))

✅ Slide 7: Benefits of Backoff Tagging


Title: Why Use Backoff?
Content:
 Increases accuracy by using multiple strategies
 Provides robust fallback for unknown words
 Can be customized for task-specific pipelines

✅ Slide 8: Applications
Title: Where is Sequential Tagging Used?
Content:
 POS tagging
 Named Entity Recognition (NER)
 Chunking (shallow parsing)
 Medical/Legal document tagging

✅ Slide 9: Summary
Title: Summary: Sequential and Backoff Tagging
Content:
 Sequential tagging uses context to improve prediction
 Backoff combines multiple taggers in a hierarchy
 Widely used in syntactic/semantic analysis pipelines
Slide 1: Introduction to Ensemble Learning
Title: What is Ensemble Learning?
Content:
 Combines predictions from multiple models
 Goal: improve accuracy, robustness, and stability
 Two main types:
o Bagging (Bootstrap Aggregation)
o Boosting (sequential learning)

✅ Slide 2: What is Bagging?


Title: Bagging (Bootstrap Aggregation)
Content:
 Builds multiple models using random samples of training data (with
replacement)
 Trains models independently and in parallel
 Final output by voting (classification) or averaging (regression)
 Reduces variance, helps avoid overfitting

✅ Slide 3: Introduction to Random Forests


Title: What is a Random Forest?
Content:
 Ensemble of Decision Trees
 Uses bagging + feature randomness
 Each tree trained on a bootstrapped dataset
 Combines predictions from all trees for final decision

✅ Slide 4: Random Forest – Key Characteristics


Title: Why Random Forest Works
Content:
 Handles high-dimensional data well
 Reduces overfitting from individual trees
 Provides feature importance ranking
 Scalable and parallelizable

✅ Slide 5: Random Sampling Process


Title: Bootstrap Sampling
Content:
 Each tree sees a random subset of the training data
 Samples with replacement
 Introduces diversity among trees

✅ Slide 6: Random Feature Selection


Title: Feature Randomness in Forests
Content:
 At each split, a random subset of features is chosen
 Helps reduce correlation between trees
 Promotes decorrelation and increases model diversity
✅ Slide 7: Final Prediction
Title: Aggregating Results
Content:
 Classification: Majority voting among trees
 Regression: Averaged predictions
 More trees → better stability and generalization

✅ Slide 8: Advantages of Random Forests


Title: Why Use Random Forests?
Content:
 High accuracy
 Handles both numerical and categorical data
 Robust to noise and missing values
 No need for scaling or normalization

✅ Slide 9: Applications
Title: Use Cases of Random Forests
Content:
 Text classification
 Medical diagnosis
 Fraud detection
 Feature selection and ranking

✅ Slide 10: Summary


Title: Bagging with Random Forests – Recap
Content:
 Random Forest = Bagging + Feature Randomness
 Ensemble of diverse decision trees
 Improves accuracy, reduces overfitting
 Used in various machine learning applications
Slide 1: Introduction
Title: What is Text Classification?
Content:
 Assigning predefined labels to text documents
 Examples:
o Spam vs. Ham
o Positive vs. Negative sentiment
o News categorization (e.g., sports, tech, politics)
 Supervised Machine Learning approach

✅ Slide 2: Text Classification Pipeline


Title: Typical Pipeline Steps
Content:
1. Data Collection
2. Text Cleaning & Preprocessing
3. Feature Extraction
4. Model Selection & Training
5. Evaluation
6. Prediction & Deployment

✅ Slide 3: Data Preprocessing


Title: Preprocessing the Raw Text
Content:
 Lowercasing
 Removing punctuation, numbers, stopwords
 Tokenization
 Stemming or Lemmatization

✅ Slide 4: Feature Extraction


Title: Representing Text as Features
Content:
 Bag of Words (BoW)
 TF-IDF (Term Frequency–Inverse Document Frequency)
 Word Embeddings: Word2Vec, GloVe
 Transformer Embeddings: BERT

✅ Slide 5: Model Selection


Title: Choosing a Classifier
Content:
 Naive Bayes: fast, interpretable
 SVM: handles high-dimensional data well
 Logistic Regression: strong baseline
 Deep Learning: LSTM, CNN, BERT for complex tasks

✅ Slide 6: Training the Classifier


Title: Model Training
Content:
 Use preprocessed features and labels
 Fit model on training data
 Tune hyperparameters using cross-validation
 Tools: scikit-learn, Keras, PyTorch
✅ Slide 7: Evaluating the Model
Title: Model Evaluation Metrics
Content:
 Accuracy: Overall correctness
 Precision & Recall: Class-specific performance
 F1-Score: Harmonic mean of precision and recall
 Confusion Matrix: Visual error analysis

✅ Slide 8: Deployment
Title: Using the Trained Model
Content:
 Save model using joblib, pickle, or export to ONNX
 Build a prediction pipeline
 Integrate into web apps, chatbots, or APIs

✅ Slide 9: Example Code Snippet (sklearn)


Title: Code: Naive Bayes Classifier
python
CopyEdit
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

text_clf = Pipeline([
('tfidf', TfidfVectorizer()),
('clf', MultinomialNB())
])
text_clf.fit(train_texts, train_labels)
preds = text_clf.predict(test_texts)

✅ Slide 10: Summary


Title: Building a Text Classifier – Recap
Content:
 Clean → Vectorize → Train → Evaluate
 Naive Bayes, SVM, Deep Learning models available
 Evaluate using precision, recall, F1
 Easily deployable for real-world tasks
Slide 1: What is Sentiment Analysis?
Title: Introduction to Sentiment Analysis
Content:
 Task of classifying the emotional tone of text
 Common labels: Positive, Negative, Neutral
 Applications:
o Product reviews
o Social media monitoring
o Customer feedback

✅ Slide 2: Tools for Sentiment Analysis


Title: Common Libraries and Models
Content:
 TextBlob – Simple, rule-based sentiment analysis
 VADER (NLTK) – Best for social media, emojis
 Transformers (BERT, RoBERTa) – Deep learning models
 HuggingFace Transformers – Pretrained SOTA models

✅ Slide 3: Simple Sentiment Analysis using TextBlob


Code:
python
CopyEdit
from textblob import TextBlob

sentence = "I absolutely love this product!"


blob = TextBlob(sentence)

print(blob.sentiment)
Output:
scss
CopyEdit
Sentiment(polarity=0.625, subjectivity=0.6)
 Polarity: [-1.0, 1.0] → negative to positive
 Subjectivity: [0.0, 1.0] → fact to opinion

✅ Slide 4: Rule-Based Example using VADER (NLTK)


Code:
python
CopyEdit
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()
sentence = "The movie was not good, it was fantastic!"

score = sia.polarity_scores(sentence)
print(score)
Output:
bash
CopyEdit
{'neg': 0.0, 'neu': 0.269, 'pos': 0.731, 'compound': 0.8481}
 compound > 0.05 → positive
 compound < -0.05 → negative
 Else → neutral

✅ Slide 5: Sentiment Using BERT (Transformers)


Code (HuggingFace Transformers):
python
CopyEdit
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I'm extremely disappointed with the service.")
print(result)
Output:
css
CopyEdit
[{'label': 'NEGATIVE', 'score': 0.9982}]
 Deep learning-based, context-aware sentiment detection
 More accurate for long or ambiguous text

✅ Slide 6: Summary
Title: Sentiment Analysis – Summary
Content:
 TextBlob: Quick and easy for beginners
 VADER: Best for social/web content
 BERT-based models: Deep and accurate
 Choose tool based on complexity and domain
Slide 1: What is Topic Modeling?
Title: Introduction to Topic Modeling
Content:
 Unsupervised learning technique to discover abstract topics in a
collection of documents
 Helps summarize and organize large datasets
 Common algorithms:
o Latent Dirichlet Allocation (LDA)
o Non-negative Matrix Factorization (NMF)

✅ Slide 2: Why Topic Modeling?


Title: Applications of Topic Modeling
Content:
 Automatically group documents by topic
 Extract themes from articles, reviews, or tweets
 Used in:
o News aggregation
o Recommender systems
o Exploratory text analysis

✅ Slide 3: Preprocessing Text for Topic Modeling


Title: Cleaning Steps
Content:
 Lowercasing
 Removing punctuation, numbers
 Removing stopwords
 Lemmatization
 Tokenization

✅ Slide 4: LDA: Latent Dirichlet Allocation


Title: How LDA Works
Content:
 Assumes each document is a mixture of topics
 Each topic is a mixture of words
 Outputs:
o Topics as word clusters
o Distribution of topics per document

✅ Slide 5: LDA Implementation in Python


Code:
python
CopyEdit
import gensim
from gensim import corpora
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Sample documents
docs = [
"I love machine learning and natural language processing.",
"Deep learning allows machines to understand human language.",
"Text mining is a core part of NLP applications.",
"Topic modeling finds hidden structure in documents."
]

# Preprocessing
stop_words = set(stopwords.words('english'))
texts = [[word for word in word_tokenize(doc.lower()) if word.isalpha() and
word not in stop_words] for doc in docs]

# Create dictionary and corpus


dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Train LDA model


lda_model = gensim.models.LdaModel(corpus, num_topics=2,
id2word=dictionary, passes=15)

# Print topics
topics = lda_model.print_topics(num_words=5)
for topic in topics:
print(topic)
✅ Slide 6: Sample Output
Title: Sample LDA Topics
Output Example:
nginx
CopyEdit
Topic 0: 0.20*"language" + 0.15*"learning" + 0.10*"human" + ...
Topic 1: 0.25*"topic" + 0.18*"documents" + 0.12*"modeling" + ...
 Each topic is a list of top keywords
 Helps interpret the dominant themes

✅ Slide 7: Visualizing Topics (Optional)


Title: Topic Visualization Tools
Content:
 pyLDAvis – interactive visualization of topics
python
CopyEdit
import pyLDAvis.gensim_models
pyLDAvis.enable_notebook()
pyLDAvis.gensim_models.prepare(lda_model, corpus, dictionary)

✅ Slide 8: Summary
Title: Topic Modeling Recap
Content:
 Helps uncover hidden patterns in unstructured text
 LDA is the most widely used technique
 Use topic keywords to label or summarize content
 Great for document clustering and thematic analysis
UNIT 2
Step-by-Step Python Code to Apply FFT and Plot
🔧 1. Import Required Libraries
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from scipy.fft import fft, fftfreq

📥 2. Load the Audio File (WAV format)


python
CopyEdit
# Replace with your audio file path
sample_rate, signal = wavfile.read("example.wav")
# If stereo, take only one channel
if len(signal.shape) > 1:
signal = signal[:, 0]

print(f"Sample Rate: {sample_rate} Hz")


print(f"Signal Duration: {len(signal)/sample_rate:.2f} seconds")

⚙️3. Apply Fourier Transform


python
CopyEdit
n = len(signal) # Length of the signal
yf = fft(signal) # Compute FFT
xf = fftfreq(n, 1 / sample_rate) # Frequency bins

📊 4. Plot the Frequency Spectrum


python
CopyEdit
plt.figure(figsize=(12, 6))
plt.plot(xf[:n//2], np.abs(yf[:n//2])) # Only positive frequencies
plt.title("Frequency Spectrum")
plt.xlabel("Frequency (Hz)")
plt.ylabel("Amplitude")
plt.grid()
plt.show()

📌 Key Notes
 FFT transforms a signal from time domain to frequency domain.
 np.abs(yf) gives the magnitude of each frequency component.
 You usually plot only the positive frequencies for interpretation.
PowerPoint-style Breakdown: Generating Audio Signals
✅ Slide 1: Introduction
Title: Why Generate Audio Signals?
Content:
 To simulate tones, speech, or noise for testing
 Useful for audio classification, model training
 Controlled frequency, amplitude, and duration

✅ Slide 2: Basic Parameters


Title: Custom Signal Parameters
Content:
 Frequency (Hz) – Pitch of the sound (e.g., 440 Hz = A4)
 Amplitude – Loudness of the signal
 Sampling Rate (Hz) – How many samples per second (e.g., 44100 Hz)
 Duration (sec) – Length of the signal

✅ Slide 3: Python Code – Generate a Sine Wave


python
CopyEdit
import numpy as np
from scipy.io.wavfile import write

# Custom parameters
freq = 440 # Frequency in Hz (A4)
duration = 2 # Duration in seconds
sample_rate = 44100 # Samples per second
amplitude = 0.5 # Amplitude (0 to 1)

# Time axis
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)

# Generate sine wave


signal = amplitude * np.sin(2 * np.pi * freq * t)

# Scale to 16-bit PCM format and save


scaled = np.int16(signal * 32767)
write("sine_wave.wav", sample_rate, scaled)

✅ Slide 4: Output
File Created: sine_wave.wav
 Frequency: 440 Hz
 Duration: 2 seconds
 Format: 16-bit PCM WAV
 You can play it using any media player

✅ Slide 5: Visualizing the Signal


python
CopyEdit
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 4))
plt.plot(t[:1000], signal[:1000]) # Show only a small segment
plt.title("Generated Sine Wave (440 Hz)")
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.grid()
plt.show()

✅ Slide 6: Summary
Title: Audio Signal Generation – Recap
Content:
 Define frequency, duration, amplitude, sampling rate
 Use numpy for waveform generation
 Save with scipy.io.wavfile.write()
 Useful for creating synthetic test data
Parameters You Can Set
 Frequency – Controls pitch of the sound (e.g., 440 Hz = A4).
 Amplitude – Controls loudness (range: 0 to 1 for normalized).
 Sample Rate – How many audio samples per second (commonly 44100
Hz).
 Duration – Total length of the sound in seconds.

✅ Python Code: Create the Audio Sample


python
CopyEdit
import numpy as np
from scipy.io.wavfile import write
import matplotlib.pyplot as plt
# Parameters
frequency = 440 # Frequency in Hz (e.g., A4 note)
amplitude = 0.7 # Amplitude (0.0 to 1.0)
duration = 2 # Duration in seconds
sample_rate = 44100 # Sampling rate in Hz

# Time array
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)

# Generate sine wave


signal = amplitude * np.sin(2 * np.pi * frequency * t)

# Save to WAV file


scaled_signal = np.int16(signal * 32767) # Convert to 16-bit PCM format
write("custom_audio_sample.wav", sample_rate, scaled_signal)

📊 Visualize the Waveform


python
CopyEdit
plt.figure(figsize=(10, 4))
plt.plot(t[:1000], signal[:1000]) # Plot only first 1000 samples for clarity
plt.title(f"Sine Wave - {frequency}Hz")
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.grid(True)
plt.show()
✅ Output
 File created: custom_audio_sample.wav
 Duration: 2 seconds
 Frequency: 440 Hz
 Amplitude: 0.7
 Sampling Rate: 44100 Hz
Slide 1: Introduction to Audio Features
Title: Why Extract Features from Audio?
Content:
 Raw audio is high-dimensional and noisy
 Feature extraction simplifies signal while preserving key info
 MFCC and Filter Banks are widely used in speech/audio processing

✅ Slide 2: What is MFCC?


Title: Mel-Frequency Cepstral Coefficients (MFCC)
Content:
 Mimics human auditory system
 Captures timbral and spectral features of audio
 Often used in speech recognition, emotion detection

✅ Slide 3: MFCC Extraction Steps


Title: MFCC Extraction Pipeline
Content:
1. Pre-emphasis
2. Framing & Windowing
3. Fast Fourier Transform (FFT)
4. Apply Mel filter bank
5. Logarithm of energies
6. Discrete Cosine Transform (DCT)

✅ Slide 4: Mel Filter Banks


Title: What Are Filter Banks?
Content:
 A set of triangular filters spaced along Mel scale
 Each filter captures energy in a frequency band
 Used before computing MFCCs

✅ Slide 5: Mel Scale


Title: Mel Scale – Human Perception
Content:
 Scale that models human ear sensitivity to frequency
 More resolution at lower frequencies
 Formula:
mel(f) = 2595 * log10(1 + f / 700)

✅ Slide 6: Python Code – Compute MFCC using Librosa


python
CopyEdit
import librosa
import librosa.display
import matplotlib.pyplot as plt

# Load audio
y, sr = librosa.load("audio.wav")
# Compute MFCC
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

# Display
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar()
plt.title("MFCC")
plt.show()

✅ Slide 7: Filter Bank Energies in Python


python
CopyEdit
import librosa

# Compute Mel spectrogram (Filter Bank)


S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=40)

# Convert to dB
S_dB = librosa.power_to_db(S, ref=np.max)

# Display
librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=sr)
plt.colorbar()
plt.title("Mel Filter Bank")
plt.show()

✅ Slide 8: Summary
Title: MFCC & Filter Bank Features – Summary
Content:
 MFCCs are compact, cepstral representations of sound
 Filter banks model perceptual frequency bands
 Both are widely used in speech/audio ML pipelines
 Extractable using librosa, python_speech_features, etc.
Slide 1: Introduction to HMM
Title: What is a Hidden Markov Model?
Content:
 A statistical model where the system is a Markov process with hidden
states
 Consists of:
o Hidden states (e.g., phonemes, words)
o Observable events (e.g., speech signals, MFCCs)
o Transition probabilities between states

✅ Slide 2: HMM Applications


Title: Where is HMM Used?
Content:
 Speech recognition (word/phoneme sequences)
 POS tagging and NLP
 Gesture and handwriting recognition
 Bioinformatics (DNA sequencing)

✅ Slide 3: HMM Components


Title: Components of an HMM
Content:
 N: Number of hidden states
 M: Number of observation symbols
 A: State transition probability matrix
 B: Observation probability matrix
 π: Initial state distribution

✅ Slide 4: Training HMM – Learning the Model


Title: Training HMM Parameters
Content:
 Goal: Learn A, B, and π from data
 Algorithms:
o Baum-Welch (an EM algorithm) – finds optimal parameters from
observed sequences
o Iterative update of expected counts until convergence

✅ Slide 5: HMM Prediction – Decoding


Title: Predicting with HMM
Content:
 Goal: Find most likely sequence of hidden states given observations
 Algorithm:
o Viterbi Algorithm – dynamic programming
o Outputs the optimal state path (e.g., predicted word sequence)

✅ Slide 6: Python Code – HMM Using hmmlearn


python
CopyEdit
from hmmlearn import hmm
import numpy as np
# Simulated MFCC-like observations
observations = np.random.rand(100, 13) # 100 time steps, 13 features

# Define and train a Gaussian HMM


model = hmm.GaussianHMM(n_components=5, covariance_type="diag",
n_iter=100)
model.fit(observations)

# Predict state sequence


states = model.predict(observations)
print(states)

✅ Slide 7: HMM Summary


Title: HMM – Summary
Content:
 Models sequences with hidden structure
 Learns transitions and emissions
 Useful in temporal pattern recognition
 Trained with Baum-Welch, predicted with Viterbi
 Python library: hmmlearn, pomegranate
MFCC Features – Introduction
 MFCC = Mel-Frequency Cepstral Coefficients
 Widely used feature in speech/audio processing
 Captures timbral and spectral characteristics

2. Why Use MFCC?


 Models human hearing perception
 Compact and informative representation
 Common in speech recognition, speaker identification

3. Steps to Extract MFCC


1. Pre-emphasis
2. Framing and windowing
3. FFT (Fast Fourier Transform)
4. Apply Mel filter bank
5. Logarithm of filter bank energies
6. Discrete Cosine Transform (DCT)

4. Understanding the Mel Scale


 Mimics human perception of pitch
 Higher resolution at lower frequencies
 Formula:
mel(f) = 2595 * log10(1 + f / 700)

5. MFCCs in Python using Librosa


python
CopyEdit
import librosa
import librosa.display
import matplotlib.pyplot as plt

# Load audio
y, sr = librosa.load('audio.wav')
# Compute MFCC
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

# Display
librosa.display.specshow(mfcc, x_axis='time')
plt.colorbar()
plt.title("MFCC")
plt.show()

6. Typical Use of MFCCs


 Input features for machine learning models (SVM, HMM, DNN)
 Audio classification tasks
 Emotion and language identification

7. MFCC Summary
 Key feature set for audio machine learning
 Represents audio in perceptual space
 Computed using Mel scale and DCT
 Libraries: librosa, scipy, python_speech_features
UNIT 3
Dissecting Time Series and Sequential Data – Detailed Notes

1. What is Time Series Data?


 Definition: A time series is a sequence of data points collected or
recorded at time intervals, typically equally spaced (e.g., daily, monthly).
 Examples:
o Stock market prices
o Temperature records
o Sales over months
o Heart rate signals

2. Characteristics of Time Series Data


 Trend: Long-term upward or downward movement in the data.
 Seasonality: Repeating short-term cycle or pattern (e.g., yearly sales
peak during holidays).
 Cyclic Patterns: Long-term oscillations not of fixed period.
 Noise: Random variation that cannot be explained.

3. Sequential Data vs. Time Series Data


Aspect Time Series Sequential Data
Time Ordered, but not necessarily
Ordered by timestamp
dependence timestamped
Weather, finance, traffic Text, DNA sequences, sensor
Examples
flow readings
Processing Tools ARIMA, Prophet, LSTM RNN, HMM, Transformer, etc.
Aspect Time Series Sequential Data

4. Preprocessing Time Series Data


 Handling Missing Values: Interpolation, forward/backward fill.
 Smoothing: Moving average, exponential smoothing.
 Resampling: Changing the frequency (e.g., from daily to weekly).
 Normalization: Scaling values for better model performance.
 Differencing: Removes trends and makes the series stationary.

5. Feature Engineering for Time Series


 Lag Features: Use previous values as features (e.g., y(t-1), y(t-2)).
 Rolling Statistics: Compute rolling mean, std, min, max over windows.
 Time-based Features:
o Day of week, month, quarter
o Holiday flag
 Fourier/Wavelet Transforms: Capture periodic components.

6. Statistical Modeling of Time Series


a. ARIMA (AutoRegressive Integrated Moving Average)
 AR: Autoregressive part (uses past values)
 I: Integrated (differencing to make series stationary)
 MA: Moving average (uses past error terms)
b. Seasonal ARIMA (SARIMA)
 Extension of ARIMA for seasonal components.

7. Machine Learning Models for Sequential Data


 Recurrent Neural Networks (RNNs): Preserve memory of previous
inputs.
 LSTM (Long Short-Term Memory): Solves the vanishing gradient problem
in RNNs.
 GRU (Gated Recurrent Unit): Simpler and faster alternative to LSTM.
 1D CNN: Useful for feature extraction from fixed-length sequences.

8. Transformer Models for Sequence Tasks


 Self-attention mechanism allows focusing on different parts of input
sequence.
 Useful in:
o Time series forecasting
o Text classification
o Event sequence modeling

9. Evaluation Metrics
 MAE (Mean Absolute Error): Average absolute difference between
predictions and actuals.
 RMSE (Root Mean Square Error): Penalizes large errors more than MAE.
 MAPE (Mean Absolute Percentage Error): Scaled by the actual value
(good for business KPIs).

10. Applications
 Financial Forecasting: Stock price prediction.
 Healthcare: ECG signal classification.
 IoT: Sensor data stream analysis.
 Natural Language Processing: Language modeling, translation.
ransforming Data into Time Series Format – Detailed Notes
1. What Does It Mean to Transform Data into Time Series Format?
To apply time series techniques, data must be structured such that each record
corresponds to a time step. This involves:
 Ensuring the data is ordered chronologically
 Associating each record with a timestamp
 Resampling or restructuring the data as required for consistency and
completeness

2. Key Components of Time Series Data


 Timestamp (Index): A datetime field that indicates when the
measurement occurred.
 Value: The observed metric at that time (e.g., temperature, stock price).
 Optional Attributes: Day of the week, season, holiday, etc.

3. Common Data Sources to Transform


Data Type Transform Into Time Series By...
Transaction Logs Grouping by time window (e.g., daily, hourly)
Sensor Readings Aligning values to regular intervals
Web Traffic Logs Counting hits over time windows
Text Logs or Tweets Aggregating messages per time unit

4. Steps to Transform Raw Data into Time Series Format


Step 1: Parse and Format Date-Time
 Convert date fields into datetime objects using pandas.to_datetime() (in
Python).
 Set datetime as index:
python
CopyEdit
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)
Step 2: Sort the Data
 Ensure that the data is chronologically ordered.
Step 3: Resample or Aggregate
 Resample to a consistent frequency (hourly, daily, weekly):
python
CopyEdit
df = df.resample('D').mean() # Daily average
Step 4: Handle Missing Values
 Fill gaps using:
o Forward/Backward Fill: df.ffill(), df.bfill()
o Interpolation: df.interpolate()
o Filling with zeros or fixed values if appropriate
Step 5: Feature Engineering
 Add:
o Lag Features (e.g., y(t-1), y(t-2))
o Rolling Averages (e.g., 7-day moving average)
o Date-Time Features (e.g., day of week, month)

5. Example
Suppose you have raw sales data:
Date Store Sales
2023-01-01 A 120
2023-01-03 A 130
Date Store Sales
2023-01-04 A 90
Convert to Time Series:
python
CopyEdit
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df = df.resample('D').sum().fillna(0)
This fills missing dates (e.g., 2023-01-02) with zero and ensures a consistent
daily format.

6. Tools and Libraries


 Python: pandas, numpy, matplotlib, statsmodels
 R: tsibble, forecast, lubridate
 Excel: Use pivot tables and charting tools with date formatting

7. Applications
 Forecasting future trends (e.g., sales, temperature, demand)
 Anomaly detection in IoT sensors
 Sequential modeling in finance and healthcare
Time Series Data Conversion Using Pandas and NumPy

✅ 1. Introduction
Pandas and NumPy are essential Python libraries for handling and analyzing
time series data.
 Pandas: Best for handling structured data, dates, indexing, and time-
based operations.
 NumPy: Useful for numerical computations and array manipulation.
📌 2. Creating Time Series Data in Pandas
🧾 a. From a List or Dictionary
python
CopyEdit
import pandas as pd

dates = pd.date_range('2023-01-01', periods=5, freq='D')


data = [100, 110, 105, 120, 130]
ts = pd.Series(data, index=dates)
➕ Options:
 freq='D': Daily frequency
 'H': Hourly
 'M': Month-end
 'W': Weekly

📅 3. DateTime Conversion
🔄 Convert a column to datetime:
python
CopyEdit
df['timestamp'] = pd.to_datetime(df['timestamp'])
Set as index:
python
CopyEdit
df.set_index('timestamp', inplace=True)
This makes the DataFrame time-aware and ready for time series operations.
🔁 4. Resampling Time Series Data
🎯 Change frequency (e.g., daily to weekly):
python
CopyEdit
df_resampled = df.resample('W').mean() # Weekly average
Common resample methods:
 .mean(), .sum(), .max(), .min()

📉 5. Handling Missing Time Series Data


a. Reindexing:
python
CopyEdit
full_range = pd.date_range(start=df.index.min(), end=df.index.max(), freq='D')
df = df.reindex(full_range)
b. Filling Methods:
python
CopyEdit
df.fillna(method='ffill') # Forward fill
df.fillna(method='bfill') # Backward fill
df.interpolate() # Linear interpolation

🔍 6. Rolling and Shifting


a. Rolling window operations:
python
CopyEdit
df['rolling_mean'] = df['value'].rolling(window=3).mean()
b. Shifting data (lags):
python
CopyEdit
df['lag_1'] = df['value'].shift(1)

🧮 7. Using NumPy for Time Series


a. Time-stamped arrays:
python
CopyEdit
import numpy as np
dates = np.array(['2023-01-01', '2023-01-02'], dtype='datetime64[D]')
values = np.array([100, 200])
b. Arithmetic and differences:
python
CopyEdit
np.diff(values) # [100]

📈 8. Visualization
python
CopyEdit
import matplotlib.pyplot as plt
ts.plot()
plt.show()

🧠 9. Use Cases
 Stock price analysis
 Sales trend forecasting
 Sensor data smoothing
 Predictive maintenance

Summary: Common Pandas Time Series Functions


Function Purpose
pd.to_datetime() Convert to datetime
pd.date_range() Create time ranges
.resample() Frequency conversion
.rolling() Moving averages
.shift() Lag features
.interpolate() Fill missing values
Slicing and Operating on Time Series Data – Notes

1. Slicing Time Series Data


 Slicing is used to extract a portion of time series data based on time
ranges.
 Works only if the index is a DateTimeIndex.
Example:
python
CopyEdit
df = pd.read_csv('data.csv', parse_dates=['date'])
df.set_index('date', inplace=True)

monthly_data = df['2023-01'] # All data in Jan 2023


data_range = df['2023-01-01':'2023-01-10'] # Data from Jan 1 to Jan 10
Using loc:
python
CopyEdit
df.loc['2023'] # All data for 2023
df.loc['2023-06'] # All data for June 2023

2. Operating on Time Series Data


Resampling:
python
CopyEdit
df.resample('M').sum() # Monthly total
df.resample('W').mean() # Weekly average
Rolling Window:
python
CopyEdit
df['rolling_mean'] = df['value'].rolling(window=3).mean()
df['rolling_std'] = df['value'].rolling(window=5).std()
Shifting and Differencing:
python
CopyEdit
df['lag_1'] = df['value'].shift(1) # Lag by 1 time step
df['diff'] = df['value'].diff() # First difference
DateTime Features:
python
CopyEdit
df['day'] = df.index.day
df['month'] = df.index.month
df['weekday'] = df.index.dayofweek
Aggregation:
python
CopyEdit
df.groupby(df.index.year).sum()
df.groupby(df.index.month).mean()

3. Visualizing Sliced or Operated Data


python
CopyEdit
import matplotlib.pyplot as plt

df['value'].plot(label='Original')
df['rolling_mean'].plot(label='Rolling Mean')
plt.legend()
plt.show()
Correlation Coefficients – Detailed Notes

✅ 1. What is Correlation?
 Correlation measures the strength and direction of a linear relationship
between two variables.
 It is a statistical metric that tells how closely two variables move
together.
 Value range: from -1 to +1.
Correlation Coefficient (r) Relationship Type
+1 Perfect positive correlation
0 No correlation
-1 Perfect negative correlation
🔢 2. Types of Correlation Coefficients
a. Pearson Correlation Coefficient (r)
 Measures linear correlation between two continuous variables.
 Formula:
r=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2⋅∑(yi−yˉ)2r = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}

⋅∑(yi−yˉ)2∑(xi−xˉ)(yi−yˉ)
{\sqrt{\sum{(x_i - \bar{x})^2}} \cdot \sqrt{\sum{(y_i - \bar{y})^2}}}r=∑(xi−xˉ)2

 Assumes:
o Data is normally distributed
o Linearity and homoscedasticity
b. Spearman Rank Correlation (ρ)
 Non-parametric: Based on ranked data.
 Measures monotonic relationships (increasing or decreasing).
 Use when:
o Data is ordinal or non-normally distributed
o There are outliers
c. Kendall Tau Correlation (τ)
 Also based on rank, but focuses on the number of concordant/discordant
pairs.
 More robust for small datasets.

🧠 3. When to Use Which?


Situation Use
Linear, continuous variables Pearson
Ordinal or ranked data Spearman
Small samples, non-linear data Kendall Tau
4. Calculating Correlation in Python
python
CopyEdit
import pandas as pd

df.corr(method='pearson') # Default
df.corr(method='spearman')
df.corr(method='kendall')
Example:
python
CopyEdit
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

np.corrcoef(x, y) # Returns a 2x2 correlation matrix

📈 5. Visualizing Correlation
a. Correlation Heatmap
python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')


plt.show()
b. Scatter Plot
python
CopyEdit
plt.scatter(df['x'], df['y'])
plt.title('Scatter Plot of x vs y')

🚧 6. Pitfalls and Misinterpretations


 Correlation ≠ Causation: Just because two variables correlate doesn’t
mean one causes the other.
 Outliers can significantly affect Pearson correlation.
 Nonlinear relationships may have low Pearson correlation even if a
strong relationship exists.

📦 7. Applications
 Feature Selection: Drop highly correlated features to avoid
multicollinearity.
 Time Series Analysis: Lag correlation helps in forecasting.
 Finance: Stock price co-movement.
 Healthcare: Biomarker and disease severity relationship.
Goal: Train a Gaussian HMM on Sequential Data
A Gaussian HMM is used when your observations are continuous numeric
values (e.g., sensor readings, speech features, stock prices).

✅ Step 1: Install Required Library


bash
CopyEdit
pip install hmmlearn
✅ Step 2: Import Packages
python
CopyEdit
import numpy as np
from hmmlearn import hmm

✅ Step 3: Prepare Data


Assume you have 1-dimensional continuous observations (e.g., temperatures):
python
CopyEdit
# Example observation sequence (e.g., temperature or signal amplitude)
X = np.array([[1.0], [2.1], [1.9], [3.2], [2.8], [4.0]]).reshape(-1, 1)
Format: 2D array with shape (n_samples, n_features) – even for 1D data.

✅ Step 4: Define and Train Gaussian HMM


python
CopyEdit
# Define the model
model = hmm.GaussianHMM(n_components=2, covariance_type="diag",
n_iter=100, random_state=42)

# Fit model to the data


model.fit(X)
 n_components=2: Number of hidden states
 covariance_type="diag": Diagonal covariance matrix (independent
features)
 n_iter: Number of EM iterations
✅ Step 5: Use the Trained Model
🔍 Predict Hidden States
python
CopyEdit
hidden_states = model.predict(X)
print("Hidden states:", hidden_states)
📈 Score the Sequence
python
CopyEdit
log_prob = model.score(X)
print("Log-likelihood of the sequence:", log_prob)
🧩 View Model Parameters
python
CopyEdit
print("Transition matrix:\n", model.transmat_)
print("Means of each hidden state:\n", model.means_)
print("Covariances of each hidden state:\n", model.covars_)

✅ Step 6: Visualize Hidden States


python
CopyEdit
import matplotlib.pyplot as plt

plt.plot(X, label='Observations')
plt.plot(hidden_states, label='Hidden States', linestyle='--')
plt.legend()
plt.title("Gaussian HMM Hidden State Sequence")
plt.show()

🧠 Tips for Real Data


 Scale or normalize input data before training.
 You can increase n_components to model more complex sequences.
 Use MFCC features or sensor streams for real-world time series.

UNIT 4
Image Content Analysis – Detailed Notes

✅ 1. What is Image Content Analysis?


Image Content Analysis refers to the process of understanding and extracting
meaningful information from images using computational techniques.
Goals include:
 Object detection and recognition
 Scene understanding
 Image classification
 Feature extraction

🧱 2. Basic Components of Image Analysis


Component Description
Pixels Basic unit of an image; contains intensity or color information
Channels Typically 1 (grayscale) or 3 (RGB) color channels
Resolution Size of the image (e.g., 224x224 pixels)
Histogram Distribution of pixel intensities
Edges/Contours Shape information extracted using filters (e.g., Sobel, Canny)

🔍 3. Feature Extraction from Images


a. Manual Feature Extraction
 Histogram of Oriented Gradients (HOG): Describes edges and texture
 SIFT/SURF: Detects key points and descriptors
 Color Histograms: Distribution of color intensities
b. Learned Features (Deep Learning)
 Convolutional Neural Networks (CNNs) automatically extract
hierarchical features:
o Layer 1: Edges
o Layer 2: Patterns/Textures
o Layer 3: Objects/Shapes

🧠 4. Image Classification
Objective:
Assign a class label to the entire image.
Pipeline:
1. Preprocessing: Resize, normalize
2. Feature extraction: HOG or CNN layers
3. Classifier: SVM, k-NN, or Softmax layer
python
CopyEdit
from tensorflow.keras.applications import VGG16
model = VGG16(weights='imagenet', include_top=True)

🎯 5. Object Detection
Detects what and where objects are in an image.
Technique Description
RCNN Region proposals + CNN for classification
YOLO Real-time detection
SSD Faster with fewer computations

🧮 6. Image Segmentation
Divides image into meaningful parts or regions.
Types:
 Semantic Segmentation: Assigns label to each pixel (e.g., car, road)
 Instance Segmentation: Detects object instances (e.g., 3 cars separately)
python
CopyEdit
# Libraries: OpenCV, skimage, torchvision

📊 7. Image Analysis Tools and Libraries


Tool/Library Purpose
OpenCV Computer vision basics
scikit-image Image processing
TensorFlow / Keras Deep learning models (CNNs, etc.)
PyTorch Customizable deep learning workflows
PIL / Pillow Basic image loading and manipulation

📈 8. Evaluation Metrics
Task Metrics
Classification Accuracy, Precision, Recall
Detection IOU (Intersection over Union), mAP
Segmentation Dice score, Pixel accuracy

🧠 9. Applications
 Facial recognition
 Autonomous vehicles
 Medical imaging (X-ray, MRI analysis)
 Satellite image analysis
 Industrial quality control
Computer Vision – Detailed Notes

✅ 1. What is Computer Vision?


Computer Vision (CV) is a field of Artificial Intelligence (AI) that trains machines
to interpret and understand the visual world using images or video.
Objective:
To extract, analyze, and understand useful information from digital images or
videos.
🧱 2. Core Tasks in Computer Vision
Task Description
Image Classification Assign a label to an image
Object Detection Identify and locate objects in an image
Semantic Segmentation Classify each pixel into a category
Instance Segmentation Distinguish between different instances of objects
Pose Estimation Detect human or object keypoints
Image Generation Generate realistic images using models like GANs

🧠 3. Image Preprocessing Techniques


 Resizing: Normalize image dimensions
 Grayscale Conversion: Reduce computational complexity
 Normalization: Scale pixel values to [0, 1] or [-1, 1]
 Data Augmentation: Improve generalization with random flips, rotation,
zoom

🔍 4. Feature Extraction Methods


a. Traditional Methods
 Edge Detection: Sobel, Canny
 Corner Detection: Harris Corner
 Texture Features: LBP (Local Binary Patterns)
 HOG (Histogram of Oriented Gradients)
b. Deep Learning-Based
 Convolutional Neural Networks (CNNs): Learn hierarchical image
features
 Transfer Learning: Use pre-trained models like VGG, ResNet, Inception,
MobileNet

🧮 5. Convolutional Neural Networks (CNNs)


Key Layers:
 Convolution Layer: Extracts local patterns
 ReLU Layer: Non-linearity
 Pooling Layer: Reduces dimensions
 Fully Connected Layer: Classifies features
 Softmax Layer: Converts scores into probabilities
Popular Architectures:
 LeNet, AlexNet, VGG, ResNet, EfficientNet

🎯 6. Object Detection
Techniques:
 R-CNN, Fast R-CNN, Faster R-CNN: Region-based CNNs
 YOLO (You Only Look Once): Real-time object detection
 SSD (Single Shot Detector): Faster than R-CNN, lower accuracy

🧠 7. Image Segmentation
 Semantic Segmentation: Pixels labeled with object classes
 Instance Segmentation: Differentiates object instances
Tools:
 U-Net (medical imaging)
 Mask R-CNN (instance segmentation)

📊 8. Evaluation Metrics
Task Metric
Classification Accuracy, Precision, Recall
Detection IOU (Intersection over Union), mAP
Segmentation Dice coefficient, Pixel Accuracy

9. Tools and Libraries


Library Use Case
OpenCV General CV tasks, image processing
Pillow (PIL) Image loading and manipulation
scikit-image Advanced image processing functions
TensorFlow/Keras, PyTorch Deep learning models
Detectron2, YOLOv5 Advanced detection and segmentation

🧠 10. Applications of Computer Vision


 Face recognition (biometrics)
 Autonomous vehicles
 Medical imaging (e.g., tumor detection)
 Retail (customer tracking, inventory)
 Agriculture (crop monitoring, disease detection)
 Industrial quality control
Edge Detection & Histogram Equalization – OpenCV

🔍 1. Edge Detection
✅ What is Edge Detection?
 Identifies boundaries of objects in images.
 Crucial for tasks like object recognition, segmentation, and feature
extraction.
🛠 Common Techniques:
Technique Description
Sobel Detects edges in horizontal & vertical axes
Laplacian Detects edges using second derivatives
Canny Advanced method for sharp edge detection

🧪 Example: Canny Edge Detection


python
CopyEdit
import cv2
import matplotlib.pyplot as plt

# Read and convert image to grayscale


img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply Canny edge detector


edges = cv2.Canny(gray, 100, 200) # thresholds

# Display result
plt.imshow(edges, cmap='gray')
plt.title("Canny Edges")
plt.show()
 100 and 200 are lower and upper threshold values.
 You can tweak them to get sharper or softer edges.
📈 2. Histogram Equalization
✅ What is Histogram Equalization?
 Improves the contrast of an image.
 Spreads out the intensity values to enhance visual detail.
📸 Use Cases:
 Medical imaging
 Low-light or poor contrast images
 Facial recognition preprocessing

🧪 Example: Grayscale Histogram Equalization


python
CopyEdit
import cv2
import matplotlib.pyplot as plt

# Read image in grayscale


gray = cv2.imread('image.jpg', 0)

# Apply histogram equalization


equalized = cv2.equalizeHist(gray)

# Show original and equalized images


plt.subplot(1, 2, 1)
plt.title("Original")
plt.imshow(gray, cmap='gray')
plt.subplot(1, 2, 2)
plt.title("Equalized")
plt.imshow(equalized, cmap='gray')
plt.show()

🧪 Example: Color Image Histogram Equalization


Use CLAHE (Contrast Limited Adaptive Histogram Equalization):
python
CopyEdit
# Convert BGR to LAB
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)

# Apply CLAHE to the L-channel


clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
cl = clahe.apply(l)

# Merge and convert back to BGR


merged = cv2.merge((cl, a, b))
final = cv2.cvtColor(merged, cv2.COLOR_LAB2BGR)

✅ Summary
Technique Purpose
Edge Detection Finds object outlines
Histogram Equalization Enhances image contrast
CLAHE Adaptive contrast enhancement
Detecting SIFT Feature Points using OpenCV

✅ 1. What is SIFT?
SIFT (Scale-Invariant Feature Transform) is a computer vision algorithm used
to:
 Detect keypoints (distinct, repeatable points like corners or blobs)
 Compute descriptors (feature vectors that describe the keypoint's local
neighborhood)
🧠 SIFT is scale-invariant and rotation-invariant, making it ideal for:
 Object recognition
 Image matching
 3D reconstruction

⚙️2. Installing OpenCV with SIFT Support


SIFT is part of the contrib module in OpenCV.
bash
CopyEdit
pip install opencv-contrib-python

🧪 3. Detecting SIFT Feature Points (Code Example)


python
CopyEdit
import cv2
import matplotlib.pyplot as plt

# Read the image


img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Initialize SIFT detector


sift = cv2.SIFT_create()

# Detect keypoints and compute descriptors


keypoints, descriptors = sift.detectAndCompute(gray, None)

# Draw keypoints
img_with_keypoints = cv2.drawKeypoints(gray, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

# Display result
plt.imshow(img_with_keypoints, cmap='gray')
plt.title("SIFT Keypoints")
plt.axis('off')
plt.show()

🧮 4. Output Explanation
 keypoints: List of keypoint objects, each with location, scale, and
orientation.
 descriptors: Numpy array of 128-dimensional feature vectors.
 You can use these descriptors to match features between images.

📦 5. Applications of SIFT
Use Case Example
Object recognition Identify known objects in new scenes
Use Case Example
Image stitching Panorama creation using matched features
Augmented Reality Anchor digital objects using real-world features
Robotics/SLAM Track position of camera in real environments

🚧 6. Notes
 SIFT is patent-free now (since 2020) and included in OpenCV by default.
 For fast but less accurate features, alternatives include:
o ORB (Oriented FAST and Rotated BRIEF) – OpenCV’s faster
alternative
o SURF (Speeded-Up Robust Features) – Faster than SIFT, but still
patented
STAR Feature Detector – OpenCV (Python)

✅ 1. What is the STAR Feature Detector?


STAR stands for CenSurE (Center Surround Extrema) detector, designed for:
 Fast and efficient keypoint detection
 Scale-invariant blob detection
 Works well for real-time applications
It is part of OpenCV’s xfeatures2d module.

⚙️2. Installing Required Package


You need OpenCV with contrib modules:
bash
CopyEdit
pip install opencv-contrib-python
🧪 3. Code Example – Detecting Features Using STAR
python
CopyEdit
import cv2
import matplotlib.pyplot as plt

# Load image and convert to grayscale


img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Initialize the STAR detector


star = cv2.xfeatures2d.StarDetector_create()

# Detect keypoints
keypoints = star.detect(gray, None)

# Draw keypoints
img_with_kp = cv2.drawKeypoints(img, keypoints, None, color=(0, 255, 0))

# Show the result


plt.imshow(cv2.cvtColor(img_with_kp, cv2.COLOR_BGR2RGB))
plt.title("STAR Feature Keypoints")
plt.axis('off')
plt.show()

📈 4. Output Explanation
 keypoints: Detected interest points in the image
 cv2.drawKeypoints: Visualizes these keypoints on the original image

🧠 5. Applications
 Feature matching (combined with descriptors)
 Object tracking
 Real-time mobile/robotic vision

🔁 6. STAR vs Other Feature Detectors


Detector Speed Scale Invariant Rotation Invariant Descriptor
SIFT Medium ✅ ✅ ✅
ORB Fast ✅ ✅ ✅
STAR Fast ✅ ❌ ❌ (detector only)
Creating Features Using Visual Codebook & Vector Quantization

✅ 1. What is a Visual Codebook?


A Visual Codebook is a dictionary of "visual words" created by clustering local
image features (e.g., SIFT, ORB, SURF) from a training set of images.
Analogy:
Just like a text document can be represented by a bag of words, an image can
be represented as a bag of visual words.

🧠 2. Why Use a Visual Codebook?


 Converts variable-length local feature descriptors into a fixed-length
feature vector.
 Enables classification of images using traditional ML algorithms (SVM,
kNN, etc.).
 Useful in:
o Object classification
o Scene recognition
o Image retrieval

🧪 3. Step-by-Step Process
🔹 Step 1: Extract Local Features
Use SIFT/ORB descriptors from multiple images:
python
CopyEdit
import cv2

sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(image, None)
Collect all descriptors from the training dataset.

🔹 Step 2: Build the Codebook (Clustering)


Use KMeans clustering to group similar descriptors into k clusters:
python
CopyEdit
from sklearn.cluster import KMeans
import numpy as np

# Assume all_descriptors is a (N x 128) matrix of SIFT descriptors from all


training images
kmeans = KMeans(n_clusters=100)
kmeans.fit(all_descriptors)
visual_words = kmeans.cluster_centers_
Each cluster center is a visual word.

🔹 Step 3: Vector Quantization


For each image:
 Assign each local descriptor to the nearest cluster center (visual word).
 Count frequency of visual words → Histogram (bag of visual words).
python
CopyEdit
# For a new image
histogram = np.zeros(len(kmeans.cluster_centers_))

for descriptor in descriptors:


idx = kmeans.predict([descriptor])[0]
histogram[idx] += 1
This histogram becomes the feature vector for the image.

🔹 Step 4: Use in Classifier


Once you convert each image into a fixed-length vector:
python
CopyEdit
from sklearn.svm import SVC

# X = list of histograms for training images, y = labels


model = SVC()
model.fit(X, y)

📦 4. Summary Table
Step Technique
Feature extraction SIFT, ORB
Clustering KMeans
Quantization Nearest cluster center
Feature vector Histogram (BoVW)
Classifier SVM, kNN, etc.

🎯 5. Applications
 Image classification
 Scene and object recognition
 Image retrieval systems

📝 Bonus: Key Terms


 BoVW: Bag of Visual Words
 Codebook: Dictionary of learned visual words
 Vector Quantization: Mapping high-dimensional descriptors to nearest
visual word
UNIT 5
Biometric Face Recognition – Notes

✅ 1. Introduction to Face Recognition


 Face recognition is a biometric method that identifies or verifies
individuals using their facial features.
 Commonly used in authentication systems, security, and surveillance.
 Two main tasks:
o Face verification: Is this person who they claim to be?
o Face identification: Who is this person?

🔁 2. Face Recognition Pipeline


1. Face Detection – Locate face in the image.
2. Face Alignment – Normalize the face (position, scale, rotation).
3. Feature Extraction – Convert facial region into a numerical feature
vector.
4. Face Matching – Compare vectors using a similarity/distance metric
(e.g., Euclidean).

📸 3. Face Detection Techniques


Technique Description
Haar Cascades Fast, classical method in OpenCV
HOG + SVM dlib-based, robust in frontal detection
DNN-based models Deep models like MTCNN, SSD, RetinaFace
🧠 4. Feature Extraction Techniques
Method Description
LBPH Local Binary Patterns Histograms
Eigenfaces PCA-based face recognition
Fisherfaces LDA-based for improved class separation
FaceNet Deep learning model producing embeddings
VGGFace Deep CNN model trained on face datasets

🤖 5. Face Recognition using Deep Learning


 Models like FaceNet or DeepFace produce embeddings – high-
dimensional vectors that represent facial features.
 Matching is done by comparing the Euclidean distance between
embeddings:
o Smaller distance → same person
o Larger distance → different person

🧪 6. OpenCV Example using LBPH


python
CopyEdit
import cv2

# Create recognizer
recognizer = cv2.face.LBPHFaceRecognizer_create()

# Train the recognizer


recognizer.train(training_images, labels)
# Predict
label, confidence = recognizer.predict(test_image)
 training_images: List of grayscale face images
 labels: Corresponding numeric labels (e.g., 0 = Alice, 1 = Bob)
 confidence: Lower is better (0 is a perfect match)

📏 7. Evaluation Metrics
Metric Description
Accuracy Overall correct predictions
FAR False Acceptance Rate (unauthorized accepted)
FRR False Rejection Rate (authorized rejected)
EER Equal Error Rate (FAR = FRR, ideal balance)

🧰 8. Tools and Libraries


Library Use
OpenCV Face detection, LBPH face recognition
dlib Facial landmark detection, HOG + SVM
face_recognition Built on dlib + FaceNet
TensorFlow/PyTorch Custom deep learning face recognition models

🌍 9. Applications of Face Recognition


 Mobile phone unlocking (e.g., Face ID)
 Passport and immigration control
 Attendance tracking systems
 Public surveillance and smart cities
 Personalized advertising and smart homes
⚠️10. Challenges in Face Recognition
Challenge Explanation
Lighting Conditions Drastically change appearance
Pose Variation Non-frontal faces are harder to recognize
Aging Changes over time affect performance
Occlusion Glasses, masks, hair can hide features
Spoofing Photos, videos used to trick the system
Face Detection from Image and Video Using OpenCV

✅ 1. What is Face Detection?


Face detection locates the presence and position of human faces in digital
images or video frames.
It's the first step in applications like:
 Face recognition
 Emotion detection
 Face tracking

2. Tools Required
Install OpenCV:
bash
CopyEdit
pip install opencv-python

3. Face Detection from Image (Haar Cascades)


🔹 Code Example:
python
CopyEdit
import cv2

# Load Haar cascade


face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')

# Load and convert image to grayscale


img = cv2.imread('face.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Detect faces
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

# Draw rectangles around faces


for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)

# Display result
cv2.imshow('Detected Face', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

🎥 4. Face Detection from Live Video (Webcam)


🔹 Code Example:
python
CopyEdit
import cv2

# Load Haar cascade


face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')

# Open webcam
cap = cv2.VideoCapture(0)

while True:
ret, frame = cap.read()
if not ret:
break

# Convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Detect faces
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1,
minNeighbors=5)

# Draw rectangles
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

# Show video
cv2.imshow('Video Face Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

⚙️5. Parameters Explained


Parameter Purpose
scaleFactor How much the image size is reduced at each scale
minNeighbors How many neighbors each candidate should have to retain it
(x, y, w, h) Rectangle coordinates of detected face

🔁 6. Alternatives to Haar Cascades


Method Library Notes
HOG + SVM dlib Robust for frontal faces
MTCNN mtcnn Deep learning-based, high accuracy
DNN Model OpenCV Uses pre-trained deep neural networks

🎯 7. Applications
 Attendance systems (with face recognition)
 Video surveillance
 Emotion analysis
 Augmented reality (face filters)
Resizing and Scaling Images – OpenCV (Python)
✅ 1. What is Resizing?
Resizing refers to changing the dimensions (width and height) of an image.
🛠 Why Resize?
 Normalize input sizes for machine learning models
 Reduce image size for faster processing
 Increase resolution for visualization or printing

🔁 2. Resizing with cv2.resize()


📌 Syntax:
python
CopyEdit
resized_img = cv2.resize(src, dsize[, fx[, fy[, interpolation]]])
 src: Source image
 dsize: Desired size as (width, height)
 fx, fy: Scaling factors for x and y (optional if dsize is given)
 interpolation: Method of interpolation

🧪 Example 1: Resize by Fixed Dimensions


python
CopyEdit
import cv2

img = cv2.imread('image.jpg')
resized = cv2.resize(img, (200, 100)) # Resize to 200x100 pixels
cv2.imshow('Resized', resized)
cv2.waitKey(0)
cv2.destroyAllWindows()
🧪 Example 2: Resize by Scale Factor
python
CopyEdit
scaled = cv2.resize(img, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR)
# Half size

🧮 3. Interpolation Methods
Method OpenCV Constant Use Case
Nearest-neighbor cv2.INTER_NEAREST Fastest, blocky results
Bilinear cv2.INTER_LINEAR Good for shrinking
Bicubic cv2.INTER_CUBIC Smoother, better for zooming
Lanczos cv2.INTER_LANCZOS4 High-quality scaling
Area-based cv2.INTER_AREA Best for reducing size

🧪 Example 3: Resize with Different Interpolations


python
CopyEdit
resize_area = cv2.resize(img, (300, 200), interpolation=cv2.INTER_AREA)
resize_cubic = cv2.resize(img, (300, 200), interpolation=cv2.INTER_CUBIC)

🎯 4. Applications
 Resize images for input into CNNs (e.g., 224×224 for ResNet)
 Thumbnail generation
 Fast image compression for transmission
 Maintaining aspect ratio in datasets
⚠️5. Tips
 Maintain aspect ratio to avoid distortion:
python
CopyEdit
h, w = img.shape[:2]
new_w = 300
new_h = int((new_w / w) * h)
resized = cv2.resize(img, (new_w, new_h))
Building a Face Detector Using Haar Cascades (OpenCV)

✅ 1. What is Haar Cascade?


 Haar Cascade is a machine learning-based object detection method.
 It uses Haar features and a cascade of classifiers trained on positive and
negative images.
 OpenCV provides pre-trained classifiers for face, eyes, smiles, etc.

🧪 2. Requirements
Install OpenCV:
bash
CopyEdit
pip install opencv-python

📥 3. Load Haar Cascade Classifier


OpenCV includes built-in XML files:
python
CopyEdit
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')
You can also use:
 haarcascade_profileface.xml for side profiles
 haarcascade_eye.xml, etc.

📸 4. Face Detection from Image


python
CopyEdit
import cv2

# Load classifier and image


face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')
img = cv2.imread('face.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Detect faces
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

# Draw rectangles
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow("Face Detection", img)


cv2.waitKey(0)
cv2.destroyAllWindows()
🎥 5. Face Detection from Video (Webcam)
python
CopyEdit
cap = cv2.VideoCapture(0)

while True:
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1,


minNeighbors=5)

for (x, y, w, h) in faces:


cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)

cv2.imshow('Face Detector', frame)

if cv2.waitKey(1) & 0xFF == ord('q'):


break

cap.release()
cv2.destroyAllWindows()

⚙️6. Key Parameters in detectMultiScale()


Parameter Description
scaleFactor Image size reduction at each scale (e.g., 1.1)
Parameter Description

minNeighbors How many neighbors to confirm detection


minSize Minimum size of the detected face (optional)

🧠 7. Advantages and Limitations


✅ Pros:
 Fast and efficient
 Easy to implement
 Works in real-time
❌ Cons:
 Sensitive to lighting, angle, and scale
 May give false positives
 Outperformed by modern DNNs (e.g., MTCNN, Dlib, YOLO)

🔧 8. Applications
 Real-time surveillance
 Mobile phone face unlock
 Attendance tracking
 Access control systems
Face Detection on a Grayscale Image using OpenCV (Haar Cascades)

✅ 1. Why Grayscale?
 Haar Cascades and many traditional detectors work on grayscale images
for:
o Simplicity (single channel)
o Performance (faster)
o Accuracy (trained on gray images)

🧪 2. Step-by-Step Python Code


🔹 Install OpenCV:
bash
CopyEdit
pip install opencv-python
🔹 Face Detection on Grayscale Image:
python
CopyEdit
import cv2

# Load Haar cascade for face detection


face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')

# Load the image


img = cv2.imread('face.jpg')

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Detect faces in the grayscale image


faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

# Draw rectangles around detected faces


for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)

# Display the output


cv2.imshow('Detected Faces', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

⚙️3. Important Parameters


Parameter Description
scaleFactor Specifies how much the image size is reduced at each scale
Minimum number of neighbors each rectangle should have to
minNeighbors
retain it

🧠 4. Summary
 Convert image to grayscale using cv2.cvtColor().
 Load Haar classifier using cv2.CascadeClassifier().
 Detect faces with detectMultiScale().
 Draw rectangles using cv2.rectangle().

🔎 Applications
 Image preprocessing for facial recognition
 Single-image face detection
 Static attendance systems
Principal Components Analysis (PCA) – Notes & Python Code

✅ 1. What is PCA?
Principal Components Analysis (PCA) is a statistical technique used to:
 Reduce the dimensionality of data
 Retain the most important information (variance)
 Find new axes (principal components) that are linear combinations of
original features

🧠 2. Why Use PCA?


Use Case Benefit
High-dimensional data Reduce noise and redundancy
Image compression Keep most of the visual info with fewer features
Face recognition (Eigenfaces) Represent faces compactly
Preprocessing Improve training speed and performance

⚙️3. PCA Process (Concept)


1. Standardize the data
2. Compute the covariance matrix
3. Calculate eigenvalues and eigenvectors
4. Sort eigenvectors by eigenvalues
5. Select top k eigenvectors to form a new feature space

🧪 4. PCA in Python using sklearn


🔹 Example: PCA on Image Data
python
CopyEdit
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np
import cv2
import matplotlib.pyplot as plt

# Load grayscale image and flatten


img = cv2.imread('face.jpg', 0)
img_flat = img.flatten().reshape(1, -1)

# Standardize data
scaler = StandardScaler()
img_scaled = scaler.fit_transform(img_flat)

# Apply PCA (reduce to 100 principal components)


pca = PCA(n_components=100)
img_pca = pca.fit_transform(img_scaled)

# Reconstruct image from PCA


img_reconstructed = pca.inverse_transform(img_pca)
img_reconstructed = scaler.inverse_transform(img_reconstructed)
img_reconstructed = img_reconstructed.reshape(img.shape)

# Display
plt.subplot(1, 2, 1)
plt.title("Original")
plt.imshow(img, cmap='gray')

plt.subplot(1, 2, 2)
plt.title("PCA Reconstruction")
plt.imshow(img_reconstructed, cmap='gray')
plt.show()

📊 5. Choosing the Number of Components


python
CopyEdit
pca = PCA().fit(data)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Variance Explained')
plt.title('PCA – Variance Retained')
plt.grid()
plt.show()
Use the elbow method or retain 95% variance to select components.

🧮 6. PCA for Face Recognition – Eigenfaces


 Treat each image as a high-dimensional vector.
 PCA reduces image to fewer dimensions → Eigenfaces.
 Used in OpenCV's face recognition systems.

🧰 7. PCA in OpenCV
python
CopyEdit
mean, eigenvectors = cv2.PCACompute(data, mean=None,
maxComponents=50)

🔁 8. Summary
Step Tool/Function
Standardization StandardScaler or manual
PCA sklearn.decomposition.PCA
Reconstruction inverse_transform()
Visualization matplotlib.pyplot
PCA in Face Recognition Systems (Eigenfaces Method)

✅ 1. What Is PCA?
Principal Component Analysis (PCA) is a dimensionality reduction technique
that transforms high-dimensional data (like pixel values of an image) into a
lower-dimensional form that retains most of the variation in the data.

🧠 2. Why Use PCA in Face Recognition?


 Faces are high-dimensional (e.g., 100×100 = 10,000 pixels).
 PCA projects face images to a lower-dimensional space while preserving
the most significant variations.
 This helps in:
o Faster computation
o Noise reduction
o Improved classification accuracy

🧪 3. Concept of Eigenfaces
 PCA learns eigenvectors (called eigenfaces) from a set of training face
images.
 Each face is represented as a weighted sum of these eigenfaces.
 Face recognition involves projecting a test image into this space and
comparing its Euclidean distance to training face vectors.
🔁 4. PCA-Based Face Recognition Pipeline
1. Collect training face images (same size)
2. Flatten each image into a 1D vector
3. Construct a matrix of training faces
4. Apply PCA:
o Compute mean face
o Subtract mean face from each image
o Calculate covariance matrix
o Extract eigenfaces (top eigenvectors)
5. Project all training images to eigenface space
6. For a test image:
o Subtract mean
o Project to same space
o Compare with training projections (e.g., Euclidean distance)

📷 5. Code Example (Using OpenCV)


python
CopyEdit
import cv2
import numpy as np

# Load face images and labels


images = [cv2.imread(f'face{i}.pgm', 0) for i in range(1, 6)]
labels = [0, 1, 2, 3, 4] # Example labels

# Create recognizer using Eigenfaces


model = cv2.face.EigenFaceRecognizer_create()
model.train(images, np.array(labels))

# Predict a new face


test_img = cv2.imread('test_face.pgm', 0)
label, confidence = model.predict(test_img)

print("Predicted Label:", label)


print("Confidence Score:", confidence)
💡 EigenFaceRecognizer automatically applies PCA and builds the recognition
system.

📊 6. Advantages of PCA in Face Recognition


Benefit Explanation
Dimensionality reduction Less storage and faster computation
Noise reduction Removes irrelevant pixel-level variation
Recognition Can recognize faces with a small dataset

⚠️7. Limitations
 Sensitive to illumination, pose, and expression
 Poor performance under occlusion or real-time conditions
 Modern systems prefer deep learning (e.g., FaceNet) for high accuracy

📚 8. Applications
 Face unlock systems
 Attendance tracking
 Image-based login systems
 Embedded systems with low compute power

📌 9. Summary
Task Tool/Method
Dimensionality Reduction PCA
Face Representation Eigenfaces (principal components)
Classifier Nearest neighbor / Euclidean distance
Libraries OpenCV, sklearn, numpy
Kernel Principal Components Analysis (Kernel PCA)

✅ 1. What Is Kernel PCA?


 Kernel PCA extends standard PCA by applying the kernel trick to perform
PCA in a higher-dimensional feature space, allowing it to:
o Handle non-linear structures in the data.
o Capture complex patterns missed by standard PCA.

🧠 2. Why Use Kernel PCA?


Scenario Kernel PCA Helps When...
Data is not linearly separable PCA fails to find a clear boundary
You want to use non-linear features E.g., curved decision boundaries
Working with biometric features Like images, face data, or speech

⚙️3. Kernel PCA Process (Simplified)


1. Map input data into a high-dimensional space using a kernel function.
2. Compute the kernel matrix (similarity matrix).
3. Perform eigen-decomposition on the kernel matrix.
4. Project the data into a reduced space using the top eigenvectors.
🧪 4. Code Example Using Scikit-Learn
python
CopyEdit
from sklearn.decomposition import KernelPCA
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles

# Create non-linear dataset


X, y = make_circles(n_samples=100, factor=.3, noise=.05)

# Apply Kernel PCA with RBF (Gaussian) kernel


kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
X_kpca = kpca.fit_transform(X)

# Visualize result
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y)
plt.title("Data projected using Kernel PCA (RBF)")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.grid(True)
plt.show()

🔁 5. Supported Kernels in sklearn


Kernel Type Description
'linear' Standard PCA
Kernel Type Description
'poly' Polynomial kernel
'rbf' Radial Basis Function (Gaussian kernel)
'sigmoid' Hyperbolic tangent kernel

📏 6. Parameters in KernelPCA
Parameter Meaning
n_components Number of principal components to keep
kernel Kernel type ('linear', 'rbf', etc.)
gamma Kernel coefficient for RBF, poly, sigmoid
degree Degree for polynomial kernel

🧠 7. Applications
 Face recognition with non-linear features
 Image compression
 Voice and audio pattern recognition
 Handwriting or gesture analysis

⚖️8. PCA vs. Kernel PCA


Feature PCA Kernel PCA
Handles non-linearity ❌ ✅
Complexity Lower Higher
Flexibility Limited to linear Depends on kernel

🧩 9. Limitations
 Computationally expensive for large datasets
 Requires tuning kernel parameters (e.g., gamma)
 May not be interpretable like linear PCA
Python Script: Kernel PCA + Plot
python
CopyEdit
import matplotlib.pyplot as plt
from sklearn.decomposition import KernelPCA
from sklearn.datasets import make_circles
from sklearn.preprocessing import StandardScaler

# 1. Generate a synthetic non-linear dataset (concentric circles)


X, y = make_circles(n_samples=200, factor=0.3, noise=0.05, random_state=42)

# 2. Standardize the data (important for many kernels)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Apply Kernel PCA with an RBF (Gaussian) kernel


kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
X_kpca = kpca.fit_transform(X_scaled)

# 4. Plot the transformed data


plt.figure(figsize=(8, 6))
scatter = plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y, cmap='coolwarm',
edgecolors='k')
plt.title("Kernel PCA (RBF Kernel) Transformed Data")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.grid(True)
plt.colorbar(scatter, label="Class")
plt.tight_layout()
plt.show()

🔍 What This Does:

 make_circles: Creates non-linear 2D data


 KernelPCA with RBF kernel: Maps data to a higher-dimensional space
and reduces it back to 2D
 Color-coded scatter plot: Shows how the classes are separated after
transformation

✅ To Run:
 Save the code to a .py file and run it in any Python environment with
matplotlib and scikit-learn installed.
Install requirements if needed:
bash
CopyEdit
pip install matplotlib scikit-learn

Independent Components Analysis (ICA)

✅ 1. What is ICA?
Independent Component Analysis (ICA) is a computational technique for
separating a multivariate signal into additive, independent non-Gaussian
components.
It is often used to solve the Blind Source Separation problem, such as:
 Separating mixed audio (cocktail party problem)
 Removing noise/artifacts from EEG or images
 Unmixing face images from compressed sources

📌 2. Difference Between PCA and ICA


Feature PCA ICA
Components are Components are statistically
Assumes
uncorrelated independent
Goal Maximize variance Maximize statistical independence
Basis
Orthogonal Not necessarily orthogonal
Vectors
Use Case Dimensionality reduction Signal separation

🔬 3. ICA Process (Conceptual Steps)


1. Center and whiten the input data.
2. Iteratively adjust weights to maximize independence.
3. Estimate the independent components (ICs).

🧪 4. Python Example Using Scikit-Learn


💡 Use Case: Separate two mixed signals
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA
# 1. Create sample signals
np.random.seed(42)
n_samples = 2000
time = np.linspace(0, 8, n_samples)

s1 = np.sin(2 * time) # Signal 1 : sinusoidal


s2 = np.sign(np.sin(3 * time)) # Signal 2 : square signal

S = np.c_[s1, s2]
S += 0.1 * np.random.normal(size=S.shape) # Add noise
S /= S.std(axis=0) # Standardize

# 2. Mix the signals


A = np.array([[1, 1], [0.5, 2]]) # Mixing matrix
X = S @ A.T # Mixed signals

# 3. Apply ICA
ica = FastICA(n_components=2)
S_ica = ica.fit_transform(X) # Reconstruct signals
A_ica = ica.mixing_ # Estimated mixing matrix

# 4. Plot results
plt.figure(figsize=(10, 6))

plt.subplot(3, 1, 1)
plt.title('Original Signals')
plt.plot(S)

plt.subplot(3, 1, 2)
plt.title('Mixed Signals')
plt.plot(X)

plt.subplot(3, 1, 3)
plt.title('Recovered Signals (ICA)')
plt.plot(S_ica)

plt.tight_layout()
plt.show()

🔎 5. Applications of ICA
Domain Use Case
Biometric Systems Face feature separation, blind denoising
Audio Processing Separate voices/sources (e.g., music)
Medical Imaging Remove noise/artifacts from EEG/ECG
Telecommunications Separate mixed signals in antennas

⚙️6. Important Notes


 ICA assumes non-Gaussian source signals.
 Works best when source signals are independent.
 PCA is often used before ICA for whitening.

🧠 Summary
Step Description
Input Mixed signals
Output Independent components
Common Tool FastICA from sklearn
Key Requirement Independence and non-Gaussian data

You might also like