0% found this document useful (0 votes)

124 views5 pages

COMP1942 Question Paper

This document is a question paper for the COMP1942 Exploring and Visualizing Data midterm examination. It contains 5 compulsory short questions worth 20 marks each, and an optional bonus question worth 10 additional marks. The questions cover topics like frequent itemset mining, FP-growth algorithm, hierarchical clustering, k-means clustering, and decision tree induction using C4.5. Students are instructed to answer all compulsory questions and have the option to answer the bonus question for extra marks.

Uploaded by

pakaMuziki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views5 pages

COMP1942 Question Paper

Uploaded by

pakaMuziki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

COMP1942 Question Paper

COMP1942 Exploring and Visualizing Data (Spring Semester 2016)

Midterm Examination (Question Paper)
Date: 17 March, 2016 (Thu)
Time: 9:00-10:15
Duration: 1 hour 15 minutes

Student ID:__ Student Name: _

Seat No. :__________________

Instructions:
(1) Please answer all questions in Part A in the answer sheet.
(2) You can optionally answer the bonus question in Part B in the answer sheet. You can obtain additional
marks for the bonus question if you answer it correctly.
(3) You can use a calculator.

Question Paper

1/5
COMP1942 Question Paper

Part A (Compulsory Short Questions)

Q1 (20 Marks)
(a) Given a dataset with the following transactions in binary format, and the support threshold = 2.
A B C D E
1 0 0 1 0
1 0 0 1 1
0 0 1 0 0
1 0 1 1 1
1 0 1 0 1

(i) What is the support of the rule “{A, E}  C”?

(ii) What is the confidence of the rule “{A, E}  C”?
(iii)What is the lift ratio of the rule “{A, E}  C”?
(iv) What are the frequent itemsets? You do not need to give the frequency of each frequent itemset.
(b) This part is independent of part (a).
Suppose that we are also given another dataset with some transactions in binary format, and the support
threshold = 2. Finally, we obtain the set S of all frequent itemsets equal to
{ {A}, {B}, {C}, {D},
{A, B}, {A, C}, {A, D}, {B, C}, {B, D},
{A, B, C}, {A, B, D} }
There are many possible datasets which have the same set S as the set of all frequent itemsets.
Please give one possible dataset which has the minimum number of transactions in binary format.
Assume that each transaction in this dataset contains A, B, C, D or E.

Q2 (20 Marks)

root
item Head of node-link
f:10 c:6 b:4
f

c
c:3 b:4
b

a a:3 a:4

Consider the above FP-tree T and given the support threshold of 3.

Apply the algorithm of FP-growth(T, NULL) and generate all the conditional FP-trees.
What are the frequent patterns generated?

2/5
COMP1942 Question Paper

Q3 (20 Marks)
(a) We are doing clustering with a dendrogram according to a distance measurement. Given a set A of
points and another set B of points, we denote the distance between A and B by D(A, B) which is
calculated based on the given distance measurement.
Suppose that we have the following information.
 There are 5 data points, namely a, b, c, d, and e.
 According to this dendrogram, if we want to find two clusters, we find that the two clusters
are {a, b} and {c, d, e} where the distance between these two clusters according to the
distance measurement is 5.0
 D({c, d}, {e}) = 4.0, D({c}, {d}) = 3.0, and D({a}, {b}) = 2.0
Is it always true that we can draw the corresponding dendrogram? If yes, please draw the
dendrogram. In this case, you are required to specify the distance metric in the dendrogram. If no,
please explain what additional information we need to draw the dendrogram. In this case, please give
the minimum possible sources of additional information.
(b) We are given five data points.
a: (1, 2), b: (2, 4), c: (7, 6), d: (6, 9), e: (8, 9)
Suppose that there are two clusters. The first cluster contains points a and b while the second cluster
contains points c, d and e.
(i) (1) What is the center of the first cluster if we use the centroid linkage as a distance measurement?
(2) What is the center of the second cluster if we use the centroid linkage as a distance measurement?
(ii) Consider the agglomerative approach for hierarchical clustering.
Suppose that these two clusters are merged.
(1) What is the center of the merged cluster if we use the centroid linkage as a distance measurement?
(2) What is the center of the merged cluster if we use the median linkage as a distance measurement?

Q4 (20 Marks)

(a) Consider Algorithm original k-means clustering. Consider that at the beginning, k means are randomly
chosen (not necessarily equal to the data points). At the beginning, we have these k means, and each
data point belongs to one of the k clusters with these means. Consider a cluster. Is it always true that if
there exists at least one data point which belongs to this cluster at the beginning, there at least one data
point which belongs to this cluster after a lot of iterations (where each iteration corresponds to an update
on k means in the algorithm)? If yes, please show the correctness of this statement. Otherwise, please
give an example showing that this statement is incorrect.

(b) Consider Algorithm forgetful sequential k-means clustering. Let a be a constant defined in this
algorithm.
(i) Please write down the steps for Algorithm forgetful sequential k-means clustering.
(ii) Consider a cluster found in the algorithm containing n examples where its initial mean is equal to m0.
Let xj be the first j-th example in this cluster and mj be the mean vector of this cluster after the first
j-th examples are added for j = 1, 2, …, n. We can express mn in the following form.
n
mn  X  m0   Y  x p
p 1

where X and Y are some expressions.

Please show that mn can be expressed in this form. After you show this statement, please also write
down what is X and what is Y.
(You are not required to memorize the formula for this question. You just need to show how you
obtain the above expression and finally you can obtain X and Y.)

3/5
COMP1942 Question Paper

Q5 (20 Marks)

The following shows a history of users with attributes “Has_Bicycle”, “Age” and “Income”. We also
indicate whether they will buy a hoverhoard or not in the last column. You cannot use XLMiner in this
question.
No. Has_Bicycle Age Income Buy_Hoverboard
1 no young fair yes
2 no young high yes
3 yes old fair yes
4 yes middle fair yes
5 no young fair no
6 no middle low no
7 yes old low no
8 yes young low no

(a) We want to train a C4.5 decision tree classifier to predict whether a new user will buy a hoverboard or
not. We define the value of attribute Buy_Hoverboard to be the label of a record.
(i) Please find a C4.5 decision tree according to the above example. In the decision tree, whenever
we process (1) a node containing at least 80% records with the same label or (2) a node containing
at most 2 records, we stop to process this node for splitting.
(ii) Consider a new young user with a bicycle whose income is fair. Please predict whether this new
user will buy a hoverboard or not.
(b) What is the difference between the C4.5 decision tree and the ID3 decision tree? Why is there a
difference?

4/5
COMP1942 Question Paper

Part B (Bonus Question)

Note: The following bonus question is an OPTIONAL question. You can decide whether you will answer
it or not.

Q6 (10 Additional Marks)

We are given four items, namely A, B, C and D. Their corresponding unit profits are pA, pB, pC and pD.

The following shows five transactions with these items. Each row corresponds to a transaction where a non-
negative integer shown in the row corresponds to the total number of occurrences of the correspondence
item present in the transaction.
A B C D
0 0 3 2
3 4 0 0
0 0 1 3
1 0 3 5
6 0 0 0
The frequency of an itemset in a row is defined to be the minimum of the number of occurrences of all items
in the itemset. For example, itemset {C, D} in the first row has frequency = 2. But, itemset {C, D} in the
third row has frequency = 1.
The frequency of an itemset in the dataset is defined to be the sum of the frequencies of the itemset in all
rows in the dataset. For example, itemset {C, D} has frequency = 2+0+1+3+0 = 6.
Define a function f on an itemset s. This function will be specified later. One example of this function is f(s)
= ispi. In this example, if s = {C, D}, then f(s) = pC + pD.
The profit of an itemset s in the dataset is defined to be the product of the frequency of this itemset in the
dataset and f(s).
For example, itemset {C, D} has profit = 6 . f({C, D})
(a) Assume that we adopt function f such that f(s) = (ispi)/|s| where |s| denotes the no. of items in s.
Suppose that we know that pA = 10, pB = 10, pC = 10 and pD = 10.
We want to find all itemsets with profit at least 50.
Can the Apriori Algorithm be adapted to find these itemsets?
If yes, please write down the pseudo-code (or steps) and illustrate it with the above example.
If no, please explain why the Apriori Algorithm cannot be adapted. In this case, please also design
an algorithm, write down the pseudo-code and illustrate it with the above example.
(b) Assume that we adopt function f such that f(s) = ispi.
Suppose that we know that pA = 5, pB = 10, pC = 6 and pD = 4.
We want to find all itemsets with profit at least 50.
Can the Apriori Algorithm be adapted to find these itemsets?
If yes, please write down the pseudo-code (or steps) and illustrate it with the above example.
If no, please explain why the Apriori Algorithm cannot be adapted. In this case, please also design
an algorithm, write down the pseudo-code and illustrate it with the above example.

End of Paper

5/5

Data Visualization Exam Paper 2014
No ratings yet
Data Visualization Exam Paper 2014
7 pages
Midterm F07 Solutions
No ratings yet
Midterm F07 Solutions
4 pages
HW 1
No ratings yet
HW 1
5 pages
Comp 1942 finalExamQuestion-2016
No ratings yet
Comp 1942 finalExamQuestion-2016
11 pages
EE4146 Test1 202324 Semb Solution
No ratings yet
EE4146 Test1 202324 Semb Solution
7 pages
Comp 1942 finalExamQuestion-2019
No ratings yet
Comp 1942 finalExamQuestion-2019
14 pages
Data Mining Practice Final Exam Solutions: True/False Questions
100% (1)
Data Mining Practice Final Exam Solutions: True/False Questions
5 pages
(COMP1942) (2022) (S) Midterm Thliai 91588
No ratings yet
(COMP1942) (2022) (S) Midterm Thliai 91588
13 pages
Data Mining Homework Guide
No ratings yet
Data Mining Homework Guide
7 pages
Data Mining Notes
No ratings yet
Data Mining Notes
31 pages
B.Tech May2022 Comp CSPE-64 Sem4
No ratings yet
B.Tech May2022 Comp CSPE-64 Sem4
4 pages
Data Mining Exam Solutions
No ratings yet
Data Mining Exam Solutions
5 pages
DM 2019
No ratings yet
DM 2019
7 pages
Exam DUT 070816 Ans
No ratings yet
Exam DUT 070816 Ans
5 pages
Major 2020
No ratings yet
Major 2020
2 pages
DM - Make Up - Sep 2019
No ratings yet
DM - Make Up - Sep 2019
3 pages
Mid-Semester Make-Up Data Mining QP v1
No ratings yet
Mid-Semester Make-Up Data Mining QP v1
3 pages
Exam DM 071214 Ans
No ratings yet
Exam DM 071214 Ans
7 pages
C-3 Pap365er
No ratings yet
C-3 Pap365er
4 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Exam dm1 121017 Ans
No ratings yet
Exam dm1 121017 Ans
8 pages
V2019 en TDT4300
No ratings yet
V2019 en TDT4300
10 pages
Mid Term
No ratings yet
Mid Term
12 pages
DM-Question Bank 2024-25 Objective Question Bank
No ratings yet
DM-Question Bank 2024-25 Objective Question Bank
14 pages
DM Endsem 2023-1
No ratings yet
DM Endsem 2023-1
4 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
126VW122019
No ratings yet
126VW122019
2 pages
M.Tech Exam: Data Warehousing & Mining
No ratings yet
M.Tech Exam: Data Warehousing & Mining
5 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
Model BSC
No ratings yet
Model BSC
1 page
Data Mining Comprehensive Exam - Regular PDF
No ratings yet
Data Mining Comprehensive Exam - Regular PDF
3 pages
CSE4005
No ratings yet
CSE4005
6 pages
DM QB
No ratings yet
DM QB
7 pages
Unit4 Mcqs
No ratings yet
Unit4 Mcqs
7 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
HCU - PH.D Computer Science 2021
No ratings yet
HCU - PH.D Computer Science 2021
18 pages
Midterm Exam - FALL 2020 Artificial Intelligence
No ratings yet
Midterm Exam - FALL 2020 Artificial Intelligence
3 pages
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)
No ratings yet
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)
4 pages
Data Mining Assignment Analysis
No ratings yet
Data Mining Assignment Analysis
10 pages
End Sem
No ratings yet
End Sem
3 pages
Data Warehousing & Mining Exam
No ratings yet
Data Warehousing & Mining Exam
4 pages
Data Mining
No ratings yet
Data Mining
7 pages
Unit 4 - Question Bank
No ratings yet
Unit 4 - Question Bank
11 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
Big Data Exercieses
No ratings yet
Big Data Exercieses
6 pages
Previous Exam Paper 2 Solutions
No ratings yet
Previous Exam Paper 2 Solutions
6 pages
Data Mining & Warehousing Exam 2024
No ratings yet
Data Mining & Warehousing Exam 2024
2 pages
DWDM Answer
No ratings yet
DWDM Answer
19 pages
DWDM MID - 2 Question Paper and Online Bits
No ratings yet
DWDM MID - 2 Question Paper and Online Bits
3 pages
CEGP013091: 49.248.216.238 08/12/2018 13:08:58 Static-238
No ratings yet
CEGP013091: 49.248.216.238 08/12/2018 13:08:58 Static-238
3 pages
PHDCS
No ratings yet
PHDCS
12 pages
MapReduce and Data Processing Quiz
No ratings yet
MapReduce and Data Processing Quiz
19 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
Pamulaan Annual Report 2010-2011
No ratings yet
Pamulaan Annual Report 2010-2011
32 pages
Mentoring Students and Young People A Handbook of Effective Practice (Andrew Miller)
100% (1)
Mentoring Students and Young People A Handbook of Effective Practice (Andrew Miller)
327 pages
Development of A Private Sector Version of The TRANSFORMATIONAL LEADERSHIP QUESTIONN
No ratings yet
Development of A Private Sector Version of The TRANSFORMATIONAL LEADERSHIP QUESTIONN
18 pages
Patrol Harbour
No ratings yet
Patrol Harbour
17 pages
CFO or Treasurer or Project Manager or Consultant
No ratings yet
CFO or Treasurer or Project Manager or Consultant
3 pages
Final Demand for Rent Payment
No ratings yet
Final Demand for Rent Payment
2 pages
Fennel
No ratings yet
Fennel
2 pages
Understanding Morphemes in Language
No ratings yet
Understanding Morphemes in Language
2 pages
Progress Test 4 For Fulltime 2B
No ratings yet
Progress Test 4 For Fulltime 2B
4 pages
Trust Law: Common Law Property Settlor Trustees Beneficiary Fiduciary
100% (3)
Trust Law: Common Law Property Settlor Trustees Beneficiary Fiduciary
8 pages
WH-Questions for English Learners
No ratings yet
WH-Questions for English Learners
8 pages
PracticalResearch2 Q1 W5 Formulation of A Conceptual Framework and Research Hypothesis
No ratings yet
PracticalResearch2 Q1 W5 Formulation of A Conceptual Framework and Research Hypothesis
19 pages
Superintendent Response
No ratings yet
Superintendent Response
3 pages
Tax Invoice Eureka Forbes Limited: Original For Recipient
No ratings yet
Tax Invoice Eureka Forbes Limited: Original For Recipient
1 page
Sampilo v. Amistad
No ratings yet
Sampilo v. Amistad
2 pages
Did Plato Nod? Some Conjectures On Egoism and Friendship in The Lysis
No ratings yet
Did Plato Nod? Some Conjectures On Egoism and Friendship in The Lysis
20 pages
Longfield Kim, Astatke Hibist Et Al, 2007, Men Who Have Sex With Men in Southeastern Europe - Underground and at Increased Risk For HIV-SIT PDF
No ratings yet
Longfield Kim, Astatke Hibist Et Al, 2007, Men Who Have Sex With Men in Southeastern Europe - Underground and at Increased Risk For HIV-SIT PDF
16 pages
Romantic Period Writings 1798-1832 PDF
No ratings yet
Romantic Period Writings 1798-1832 PDF
240 pages
L2 A2PLUS U5 Grammar Higher
No ratings yet
L2 A2PLUS U5 Grammar Higher
1 page
Lease and Usufructuary
No ratings yet
Lease and Usufructuary
3 pages
ÔN TẬP CUỐI KÌ I LỚP 6
No ratings yet
ÔN TẬP CUỐI KÌ I LỚP 6
4 pages
TRIDUUM-WPS Office
No ratings yet
TRIDUUM-WPS Office
3 pages
English 6 - Q3
50% (2)
English 6 - Q3
2 pages
Bài tập câu if, wish
No ratings yet
Bài tập câu if, wish
9 pages
2024 Kcse Examination Essential Statistics
No ratings yet
2024 Kcse Examination Essential Statistics
15 pages
Gavriilidou2022 - Compendium of Specialized Metabolite Biosynthetic Diversity Encoded in Bacterial Genomes
No ratings yet
Gavriilidou2022 - Compendium of Specialized Metabolite Biosynthetic Diversity Encoded in Bacterial Genomes
14 pages
Tex Point Symbols
No ratings yet
Tex Point Symbols
15 pages
English Lessons for Students
No ratings yet
English Lessons for Students
5 pages
DLP 4 P.E 1
No ratings yet
DLP 4 P.E 1
2 pages
Brain Anatomy and Functions
No ratings yet
Brain Anatomy and Functions
3 pages

COMP1942 Question Paper

Uploaded by

COMP1942 Question Paper

Uploaded by

COMP1942 Question Paper

COMP1942 Exploring and Visualizing Data (Spring Semester 2016)

Student ID:__________________ Student Name:__________________ _______________

Seat No. :__________________

Part A (Compulsory Short Questions)

(i) What is the support of the rule “{A, E}  C”?

Consider the above FP-tree T and given the support threshold of 3.

where X and Y are some expressions.

Part B (Bonus Question)

Q6 (10 Additional Marks)

You might also like

Student ID:__ Student Name: _