0% found this document useful (0 votes)
9 views22 pages

Mod-5 Bda Super Imp

The document covers various data mining techniques including regression analysis, the Apriori algorithm for association rule mining, and web mining classifications. It also discusses social networks as graphs, centrality measures, and the text mining process with its phases. Additionally, it contrasts text mining with data mining and explains frequent itemset mining and association rule mining.

Uploaded by

Sushant K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

Mod-5 Bda Super Imp

The document covers various data mining techniques including regression analysis, the Apriori algorithm for association rule mining, and web mining classifications. It also discusses social networks as graphs, centrality measures, and the text mining process with its phases. Additionally, it contrasts text mining with data mining and explains frequent itemset mining and association rule mining.

Uploaded by

Sushant K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

MODULE-5

1. Regression analysis, Different types of regression [10 marks]

Regression analysis is a statistical technique to estimate relationships between variables.


It predicts the value of a dependent variable based on one or more independent variables.
2.Explain Aprori algorithm 10M

Apriori Algorithm Overview

The Apriori Algorithm is widely used for frequent itemset mining and association rule
mining. It identifies frequent patterns in a transactional database by applying the Apriori
Principle:
If an itemset is frequent, all of its subsets must also be frequent. Conversely, if an
itemset is not frequent, none of its supersets can be frequent.

This principle helps to significantly reduce the search space for frequent itemsets.

Steps of the Apriori Algorithm

1. Identify Single Frequent Items


o Scan the transactional database to find frequent individual items with
support greater than or equal to the minimum support threshold.
o These items form the frequent 1-itemsets (L₁).
2. Generate Candidate Itemsets
o Combine frequent kkk-itemsets from the previous iteration to form
(k+1)(k+1)(k+1)-itemsets.
o Use the Apriori Principle to prune candidate itemsets containing
infrequent subsets.
3. Scan the Database for Support Count
o Count the occurrences of candidate itemsets in the transactional database
and retain only those meeting the minimum support threshold.
4. Repeat Until No Larger Frequent Itemsets Exist
o Continue generating and pruning candidate itemsets until no new frequent
itemsets can be found.
5. Generate Association Rules
o Use frequent itemsets to create association rules with confidence greater
than or equal to the minimum confidence threshold.

Pseudocode for Apriori Algorithm

plaintext
Copy code
Input:
D - Transactional database
min_support - Minimum support threshold

Output:
F - Set of all frequent itemsets

Steps:
1. Initialize:
k=1
F₁ = {frequent 1-itemsets with support ≥ min_support}

2. Repeat until Fₖ is empty:


a. Generate candidate itemsets (Cₖ₊₁) of size k+1 from Fₖ:
- Cₖ₊₁ = {itemsets of size k+1 formed by joining itemsets in Fₖ}
- Prune candidates in Cₖ₊₁ where any subset is not in Fₖ (Apriori Principle)

b. Count the support for each candidate itemset in Cₖ₊₁ by scanning the database.

c. Select itemsets from Cₖ₊₁ that meet the min_support threshold:


Fₖ₊₁ = {c ∈ Cₖ₊₁ | support(c) ≥ min_support}

d. Set k = k + 1

3. Combine all frequent itemsets:


F = ⋃ Fₖ (for all k)

4. Return F
3.Define Web mining and explain its classification and their applicaitons

Definition of Web Mining

Web mining is the process of discovering patterns, insights, and useful information from
the World Wide Web by applying data mining techniques. It involves analyzing web
data such as content, structure, and usage to improve web functionality, enhance user
experience, and extract valuable knowledge for businesses and researchers.

Classification of Web Mining and Applications

Web mining is broadly classified into three types based on the type of data being
analyzed:

1. Web Content Mining

• Definition: Focuses on extracting useful information from the content available on


web pages.
• Data Sources: Text, images, videos, audio, forms, user-generated content, and
application data.
• Applications:
o Content Optimization: Analyze and improve content quality to attract
more visitors.
o Search Engine Optimization (SEO): Structure and enhance content to
improve search rankings.
o Sentiment Analysis: Understand user opinions on products, services, or
topics.
o Clustering and Categorization: Group similar web content for better
navigation.
2. Web Structure Mining

• Definition: Analyzes the structure of hyperlinks between web pages to understand


relationships and connectivity.
• Key Concepts:
o Hubs: Pages with a large number of outbound links (e.g., Yahoo.com).
o Authorities: Pages with the most inbound links, providing authoritative
information (e.g., Mayoclinic.com).
• Applications:
o Search Engine Ranking: Algorithms like PageRank analyze hyperlink
structures to rank pages.
o Link Optimization: Improve website interlinking to increase visibility.
o Network Analysis: Study the interconnectedness of websites to find
influential nodes.

3. Web Usage Mining

• Definition: Focuses on extracting patterns from data generated by user


interactions with web pages.
• Data Sources:
o Server logs, browser logs, cookies, and clickstream data.
• Applications:
o User Behavior Analysis: Understand browsing patterns and preferences.
o Personalization: Create tailored recommendations for users based on usage
patterns.
o Marketing Campaigns: Evaluate the effectiveness of online
advertisements.
o Website Design: Optimize layouts and navigation based on user activity.
Comparison Table

Web Structure
Aspect Web Content Mining Web Usage Mining
Mining

Hyperlink structure of User interactions with


Focus Content of web pages
web pages web pages

Data Server logs, cookies,


Text, multimedia, forms Hyperlinks
Sources clickstream

Text mining, Graph theory, Pattern mining,


Techniques
multimedia analysis PageRank clickstream analysis

Content improvement, Search ranking, User behavior,


Applications
SEO network analysis personalization

Define social network .Explain the socilal network as grapghs with centralites ranking
and analytics detection

Definition of Social Network

A social network is a structure consisting of individuals or entities (nodes) connected by


relationships (edges), such as friendships, collaborations, communications, or shared
interests. Social networks are widely used in sociology, psychology, marketing, and data
analysis to understand interactions and influence within a group or community.

Social Network as Graphs

A social network can be represented as a graph in which:

1. Nodes (Vertices): Represent individuals or entities in the network.


2. Edges (Links): Represent the relationships or interactions between nodes.
o Directed Graph: Relationships have direction (e.g., follower-following in
Twitter).
o Undirected Graph: Relationships are mutual (e.g., Facebook friendships).
3. Weights: Edges can have weights representing the strength or frequency of the
relationship (e.g., the number of messages exchanged).
Centralities in Social Network Analysis

Centrality measures are used to identify the most influential or important nodes in a
network. Key centrality metrics include:

1. Degree Centrality:
o Measures the number of direct connections a node has.
o Application: Identifying highly connected individuals in a network (e.g.,
influencers in social media).
2. Closeness Centrality:
o Measures how close a node is to all other nodes in the network.
o Application: Finding nodes that can quickly spread information.
3. Betweenness Centrality:
o Measures how often a node acts as a bridge along the shortest path between
two other nodes.
o Application: Identifying nodes that control the flow of information.
4. Eigenvector Centrality:
o Measures the influence of a node based on the importance of its neighbors.
o Application: Identifying highly influential individuals in connected
communities.

Ranking in Social Networks

Ranking involves identifying the most important nodes in a social network based on
centrality metrics or specialized algorithms.

• PageRank Algorithm: Assigns a rank to nodes based on the number and quality
of links pointing to them. It is widely used in web search engines like Google.
• Katz Centrality: Similar to eigenvector centrality, but considers nodes at a
distance with a diminishing weight.

Social Network Analytics

Analytics focuses on extracting patterns, insights, and actionable information from social
networks. Key analyses include:

1. Community Detection:
o Identifies groups of nodes that are more densely connected within the group
than with the rest of the network.
o Applications: Finding social groups, market segmentation.
2. Influence Analysis:
o Identifies key influencers who can drive behaviors or spread information
efficiently.
o Applications: Marketing, political campaigns.
3. Sentiment Analysis:
o Analyzes the sentiment (positive, negative, neutral) expressed in
interactions.
o Applications: Brand monitoring, customer feedback analysis.
4. Network Visualization:
o Graphical representation of the network to identify patterns and clusters.
o Applications: Understanding the structure of social interactions.
5. Anomaly Detection:
o Identifies unusual patterns or outliers in the network.
o Applications: Fraud detection, cybersecurity.

Applications of Social Network as Graphs

1. Social Media Analysis:


o Understanding user interactions and identifying influencers.
2. Epidemiology:
o Studying the spread of diseases through contact networks.
3. Criminal Network Analysis:
o Identifying key individuals in criminal or terrorist networks.
4. Collaboration Networks:
o Analyzing relationships in scientific or professional collaborations.

5.Text Mining Process

Text mining is the process of extracting meaningful information and insights from large
volumes of unstructured text data. It is a crucial area of research due to the growing
availability of text data, such as social media content, online reviews, and documents.
The text mining process involves five main phases:
Phase 1: Text Pre-processing

This phase prepares the text for further analysis by cleaning and structuring it. Key steps
include:

1. Text Cleanup:
o Remove unnecessary information like HTML tags, comments, and
stopwords.
o Handle missing values, outliers, and inconsistencies (e.g., correcting "teh"
to "the").
o Example: Remove "%20" from URLs to clean up web page data.
2. Tokenization:
o Split text into smaller units (tokens), usually words or phrases, using spaces
or punctuation as delimiters.
3. Part of Speech (POS) Tagging:
o Label each word with its grammatical role (e.g., noun, verb, adjective).
o Helps identify entities like names, places, or titles.
4. Word Sense Disambiguation:
o Determine the correct meaning of words with multiple meanings based on
context.
o Example: "bank" could mean a financial institution or the side of a river.
5. Parsing:
o Create a parse tree to understand grammatical relationships between words.

Phase 2: Feature Generation

Features are variables used to represent text data for machine learning models. Common
methods include:

1. Bag of Words:
o Represent documents as collections of word frequencies without
considering order.
o Example: A document with words ["tasty", "food", "good"] is represented
as a vector of frequencies.
2. Stemming:
o Reduce words to their root form (e.g., "speaking", "speaks" → "speak").
o Normalizes variations of words to a base form.
3. Removing Stop Words:
o Remove commonly used words (e.g., "a", "it", "the") that do not contribute
meaningful information.
4. Vector Space Model (VSM):
o Represent documents as vectors using metrics like TF-IDF (Term
Frequency-Inverse Document Frequency) to evaluate word importance.
Phase 3: Feature Selection

This phase reduces dimensionality by eliminating irrelevant or redundant features.


Techniques include:

1. Dimensionality Reduction:
o Use methods like Principal Component Analysis (PCA) or Linear
Discriminant Analysis (LDA) to simplify data while retaining its key
characteristics.
2. N-gram Evaluation:
o Identify sequences of consecutive words (e.g., 2-grams: "tasty food", 3-
grams: "Crime Investigation Department").
3. Noise Detection and Outlier Evaluation:
o Identify unusual or irrelevant patterns for cleaning data.

Phase 4: Data Mining Techniques

This phase involves applying algorithms to structured data to extract patterns or insights.

1. Unsupervised Learning:
o Clustering: Group data into clusters with high intra-cluster similarity and
low inter-cluster similarity.
o Example: Identifying user groups based on similar behaviors.
2. Supervised Learning:
o Classification: Use labeled training data to categorize new data.
o Examples:
▪ Spam detection in emails.
▪ News article classification using algorithms like Naive Bayes and
SVMs.
3. Evolutionary Pattern Detection:
o Analyze trends over time, such as summarizing news events or identifying
research trends.

Phase 5: Analyzing Results

Evaluate and interpret the outcomes of the text mining process:

1. Evaluate Outcome:
o Assess the effectiveness and accuracy of the results.
2. Interpret Results:
o Identify whether the results meet the objectives. If not, revisit earlier
phases.
3. Visualization:
o Create graphs, charts, or prototypes for better understanding.
4. Application:
o Use insights to drive improvements in industries, enterprises, or research.

6.Text Mining v/s Data Mining

Text Mining vs. Data Mining: Key Differences

Dimension Text Mining Data Mining

Unstructured data: Words, phrases, Structured data: Numbers,


Nature of Data
sentences. alphabetical, and logical values.

Language Many languages and dialects; Uniform numerical systems across


Used extinct and newly discovered texts. the world.

Clarity and Sentences can be ambiguous; Numbers are precise and leave little
Precision sentiment may contradict words. room for ambiguity.

Contradictions may exist between Data inconsistency is analyzed


Consistency
different parts of a text. through statistical methods.

Captures sentiments (clear, mixed,


Not applicable as numerical data
Sentiment or ambiguous); spoken words add
doesn't carry sentiment.
complexity.

Affected by spelling errors,


Issues like missing values, outliers,
Quality variations in proper nouns, and
and statistical errors.
translation quality.

Focus on keyword-based searches, Uses statistical and machine


Nature of
thematic co-existence, and learning methods to analyze
Analysis
sentiment mining. relationships and differences.
7.Frequent Set mining and association rule mining

Frequent Itemset

A frequent itemset refers to a set of items that frequently appear together in a dataset. For
example, in the context of students studying, subjects like Python and Big Data Analytics
may often be chosen together by computer science students. In mining terms, a frequent
itemset is a subset of items that appears frequently within a dataset.

• Frequent Itemset Mining (FIM) is a data mining technique used to discover


which itemsets appear most often in a dataset. This can be applied to various
fields, such as identifying students who repeatedly show poor performance in
exams.
• Frequent Subsequence refers to a sequence of patterns that occur frequently. For
example, purchasing a football may often follow the purchase of a sports kit.
• Frequent Substructure involves finding frequent structures such as graphs, trees,
or lattices. These may be used alongside itemsets or subsequences to detect
frequent patterns in more complex data.

Association Rule Mining

Association rule mining, or association analysis, is a key method in data mining. It aims
to find interesting relationships (or rules) between items in large datasets. These rules are
commonly used to understand patterns and uncover hidden relationships. For example, an
association rule might indicate that "if a student studies Python, they are likely to also
study Big Data Analytics."

• Objective: The main goal of association rule mining is to uncover strong rules
that indicate relationships between items.
• Association Rules: These are the relationships that emerge from analyzing
frequent itemsets. For example, if we find that "Python" and "Big Data Analytics"
are frequently studied together, an association rule can be formed:
"If a student studies Python, they are likely to study Big Data Analytics."
• Algorithms: The Mahout library includes a 'parallel frequent pattern growth'
algorithm, which is used to discover frequent patterns in large datasets in parallel,
making the process faster and more scalable.

In summary, frequent itemset mining focuses on identifying items that frequently


appear together, and association rule mining helps in finding strong relationships
between those items, thus enabling better decision-making and analysis
Frequent Itemset Mining (FIM) Example:

Let's break it down step by step.

Step 1: Define Minimum Support

• We are given 5 transactions, and we choose a minimum support threshold of 60%.


This means that an itemset should appear in at least 3 out of 5 transactions to be
considered frequent.

Step 2: Identify Itemsets

We'll first identify all the individual items and pairs of items that appear in the
transactions.

1. Single Itemsets:
o {Milk}, {Bread}, {Butter}, {Cheese}
2. Two-itemsets (combinations of two items):
o {Milk, Bread}, {Milk, Butter}, {Milk, Cheese}, {Bread, Butter}, {Bread,
Cheese}, {Butter, Cheese}

Step 3: Calculate Support for Each Itemset

• Support for an itemset is the percentage of transactions in which the itemset


appears.

Let’s calculate the support for each itemset:

1. Single Items:
o {Milk}: Appears in T1, T2, T3, T5 → 4 transactions
▪ Support = 4/5 = 80%
o {Bread}: Appears in T1, T2, T4, T5 → 4 transactions
▪ Support = 4/5 = 80%
o {Butter}: Appears in T1, T3, T4, T5 → 4 transactions
▪ Support = 4/5 = 80%
o {Cheese}: Appears in T5 → 1 transaction
▪ Support = 1/5 = 20%
2. Two-itemsets:
o {Milk, Bread}: Appears in T1, T2, T5 → 3 transactions
▪ Support = 3/5 = 60%
o {Milk, Butter}: Appears in T1, T3, T5 → 3 transactions
▪ Support = 3/5 = 60%
o {Bread, Butter}: Appears in T1, T4, T5 → 3 transactions
▪ Support = 3/5 = 60%
o {Milk, Cheese}: Appears in T5 → 1 transaction
▪ Support = 1/5 = 20%
o {Bread, Cheese}: Appears in T5 → 1 transaction
▪ Support = 1/5 = 20%
o {Butter, Cheese}: Appears in T5 → 1 transaction
▪ Support = 1/5 = 20%

Step 4: Filter Frequent Itemsets

We now filter the itemsets that have support greater than or equal to the minimum
threshold (60%).

• Frequent Itemsets (Support ≥ 60%):


o {Milk} (80%), {Bread} (80%), {Butter} (80%)
o {Milk, Bread} (60%), {Milk, Butter} (60%), {Bread, Butter} (60%)

Association Rule Mining Example:

Now, let’s use Association Rule Mining to find rules between these frequent itemsets.
An association rule is of the form:

• X → Y, meaning if X occurs, then Y is likely to occur.

Step 1: Define Minimum Confidence

• We set the minimum confidence threshold at 70%. This means that for a rule X
→ Y to be strong, confidence(X → Y) should be greater than or equal to 70%.

Step 2: Generate Association Rules from Frequent Itemsets

1. Rule 1: {Milk} → {Bread}


o Support({Milk, Bread}) = 60%
o Confidence({Milk} → {Bread}) = Support({Milk, Bread}) /
Support({Milk}) = 60% / 80% = 75%
▪ This rule is strong because the confidence (75%) is greater than the
minimum threshold of 70%.
2. Rule 2: {Milk} → {Butter}
o Support({Milk, Butter}) = 60%
o Confidence({Milk} → {Butter}) = Support({Milk, Butter}) /
Support({Milk}) = 60% / 80% = 75%
▪ This rule is also strong (75% confidence > 70%).
3. Rule 3: {Bread} → {Butter}
o Support({Bread, Butter}) = 60%
o Confidence({Bread} → {Butter}) = Support({Bread, Butter}) /
Support({Bread}) = 60% / 80% = 75%
▪ This rule is strong (75% confidence > 70%).

Summary of Results:

Frequent Itemsets:

• Single items: {Milk}, {Bread}, {Butter}


• Two-itemsets: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}

Association Rules:

• {Milk} → {Bread} (75% confidence)


• {Milk} → {Butter} (75% confidence)
• {Bread} → {Butter} (75% confidence)

8.Text Mining Applications

Text mining is a powerful tool for extracting valuable insights from unstructured text
data. Here are some of the key applications across various sectors:

1. Marketing:

• Voice of the Customer: Text mining allows companies to analyze customer


feedback, complaints, and preferences in their raw form (e.g., social media,
reviews, blogs) to understand consumer behavior.
• Social Personas: By clustering social media data (tweets, blogs, etc.), businesses
can segment customers and predict behaviors, enhancing their marketing
strategies.
• Listening Platform: This tool gathers real-time data from social media, blogs, and
reviews to filter out noise and identify true consumer sentiment. This helps brands
improve product marketing and customer service.
• Customer Call Center Analysis: By analyzing call center data for recurring
complaints, text mining helps identify patterns, enabling businesses to make
informed decisions and proactively manage product issues.

2. Business Operations:

• Social Network Analysis: Text mining applied to emails, blogs, and social media
can help gauge the emotional state of employees, identifying early signs of
dissatisfaction and helping organizations take proactive steps.
• Investment Psychology: Text mining can be used to analyze social media
discussions to understand market sentiment, potentially predicting market trends
and improving investment returns.

3. Legal:

• Case History Search: Lawyers use text mining to search case histories, legal
documents, and laws for relevant information to strengthen their legal arguments.
• E-discovery: In legal processes, text mining tools are embedded in e-discovery
platforms to minimize risk while sharing legally mandated documents.
• Healthcare Insights: Text mining of case histories, testimonies, and notes can
help identify critical factors, such as healthcare morbidities, to predict and prevent
high-cost injuries or medical conditions.

4. Governance and Politics:

• Political Impact of Social Media: Text mining and social network analysis can
measure the emotional states and moods of citizens, which can influence political
strategies (e.g., predicting election outcomes through sentiment analysis).
• Geopolitical Security: Text mining can process real-time internet chatter to
identify emerging threats, aiding in national security.
• Research Meta-analysis: Academics and researchers can use text mining to
identify trends and patterns across large datasets of research papers, helping them
pinpoint new research directions.

10.K-Means Clustering: Explanation and Example

K-Means clustering is a popular unsupervised machine learning algorithm used for


partitioning data into distinct clusters. The objective of K-Means is to group data points
such that data points within the same group (or cluster) are more similar to each other
than to those in other groups. This is achieved by minimizing the variance within each
cluster.

Key Steps in K-Means Clustering:

1. Initialization: Choose the number of clusters (k) and initialize k centroids


randomly.
2. Assignment: Assign each data point to the nearest centroid based on distance
(typically Euclidean distance).
3. Update Centroids: After assigning the data points, recalculate the centroids as the
mean of the data points in each cluster.
4. Repeat: Repeat the assignment and centroid update steps until the centroids do not
change significantly (i.e., convergence).
Components of K-Means Algorithm:

• k: The number of clusters you want to divide the data into.


• Centroid: A point that represents the center of a cluster.
• Euclidean Distance: The most commonly used distance metric to measure the
similarity between data points.

Example: K-Means Clustering

Let's use a simple example of clustering students based on their exam scores.

Dataset:

We have a small dataset of students' scores in two subjects: Mathematics (Math) and
Science (Sci):

Student Math Score Science Score

A 60 70

B 80 85

C 90 88

D 50 60

E 55 65

F 75 80

We aim to clusterthese students into 2 groups (k=2).

Step 1: Initialization

• We randomly choose 2 centroids. Let's assume:


o Centroid 1 (C1): (60, 70) — Initial centroid from Student A.
o Centroid 2 (C2): (80, 85) — Initial centroid from Student B.
Step 2: Assignment

Now, we calculate the Euclidean distance between each student and the centroids to
assign students to the closest centroid.

1. Distance from C1 (60, 70):


o Student A: √((60-60)² + (70-70)²) = 0
o Student B: √((80-60)² + (85-70)²) = 22.63
o Student C: √((90-60)² + (88-70)²) = 34.64
o Student D: √((50-60)² + (60-70)²) = 14.14
o Student E: √((55-60)² + (65-70)²) = 7.07
o Student F: √((75-60)² + (80-70)²) = 15.81
2. Distance from C2 (80, 85):
o Student A: √((80-60)² + (85-70)²) = 22.63
o Student B: √((80-80)² + (85-85)²) = 0
o Student C: √((90-80)² + (88-85)²) = 10.44
o Student D: √((50-80)² + (60-85)²) = 36.06
o Student E: √((55-80)² + (65-85)²) = 29.15
o Student F: √((75-80)² + (80-85)²) = 7.07

Step 3: Assign Students to Clusters

• Cluster 1 (C1): Students A, D, E (nearest to C1).


• Cluster 2 (C2): Students B, C, F (nearest to C2).

Step 4: Update Centroids

• Centroid 1 (C1): Calculate the new centroid as the average of the points in
Cluster 1.
o C1_new = ((60 + 50 + 55) / 3, (70 + 60 + 65) / 3) = (55, 65).
• Centroid 2 (C2): Calculate the new centroid as the average of the points in
Cluster 2.
o C2_new = ((80 + 90 + 75) / 3, (85 + 88 + 80) / 3) = (81.67, 84.33).

Step 5: Repeat Assignment and Update

We repeat the assignment and update steps with the new centroids (C1_new and
C2_new).

1. New Distance from C1_new (55, 65):


o Student A: √((55-55)² + (65-65)²) = 0
o Student B: √((81.67-55)² + (84.33-65)²) = 28.84
o Student C: √((81.67-55)² + (84.33-65)²) = 28.84
oStudent D: √((55-55)² + (65-65)²) = 0
oStudent E: √((55-55)² + (65-65)²) = 0
o Student F: √((81.67-55)² + (84.33-65)²) = 28.84
2. New Distance from C2_new (81.67, 84.33):
o Student A: √((81.67-60)² + (84.33-70)²) = 24.85
o Student B: √((81.67-80)² + (84.33-85)²) = 1.67
o Student C: √((81.67-90)² + (84.33-88)²) = 9.44
o Student D: √((81.67-50)² + (84.33-60)²) = 39.98
o Student E: √((81.67-55)² + (84.33-65)²) = 28.73
o Student F: √((81.67-75)² + (84.33-80)²) = 7.11

Step 6: Convergence

• Since the assignments and centroids don't change much (convergence), the process
stops here.
• Final clusters:
o Cluster 1 (C1): Students A, D, E.
o Cluster 2 (C2): Students B, C, F.

You might also like