DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SYLLABUS
Course Name:DATA WAREHOUSING AND DATA MINING Course Code:C303
Year/ Sem: III B TECH I SEM Regulation: R20
Admitted Batch: 2020-21 Academic Year:2022-23
Course Objective:
● Introduce basic concepts and techniques of data warehousing and data mining
● Examine the types of the data to be mined and apply pre-processing methods on raw
data
● To understand and analyze supervised and unsupervised models
● To understand the issues and solutions of Discover interesting patterns.
● To understand various unsupervised models and estimate the accuracy of the
algorithms.
SYLLABUS
Unit-1 :Data Warehousing and Online Analytical Processing: Data Warehouse: Basic concepts, Data
Warehouse Modelling: Data Cube and OLAP, Data Warehouse Design and Usage, Data Warehouse
Implementation, Introduction: Why and What is data mining, What kinds of data need to be mined and
patterns can be mined, Which technologies are used, Which kinds of applications are targeted.
Unit-II
Data Pre-processing: An Overview, Data Cleaning, Data Integration, Data Reduction, Data Transformation
and Data Discretization.
UNIT - III
Classification: Basic Concepts, General Approach to solving a classification problem, Decision Tree
Induction: Attribute Selection Measures, Tree Pruning, Scalability and Decision Tree Induction, Visual
Mining for Decision Tree Induction.
UNIT – IV
Association Analysis: Problem Definition, Frequent Item set Generation, Rule Generation: Confident
Based Pruning, Rule Generation in Apriori Algorithm, Compact Representation of frequent item sets,
FPGrowth Algorithm.
UNIT - V
Cluster Analysis: Overview, Basics and Importance of Cluster Analysis, Clustering techniques, Different
Types of Clusters; K-means: The Basic K-means Algorithm, K-means Additional Issues, Bi-secting K
Means.
Text Books:
1. Data Mining concepts and Techniques, 3/e, Jiawei Han, Michel Kamber, Elsevier,2011.
2. Introduction to Data Mining: Pang-Ning Tan & Michael Steinbach, Vipin Kumar,
Pearson,2012.
Reference Books:
1. Data Mining Techniques and Applications: An Introduction, Hongbo Du, Cengage
Learning.
2. Data Mining: VikramPudi and P. Radha Krishna, Oxford Publisher.
3. Data Mining and Analysis - Fundamental Concepts and Algorithms; Mohammed J. Zaki,
Wagner
Meira, Jr, Oxford
4. Data Warehousing Data Mining & OLAP, Alex Berson, Stephen Smith, TMH.
http://onlinecourses.nptel.ac.in/noc18_cs14/preview
5. (NPTEL course by Prof.Pabitra Mitra)
http://onlinecourses.nptel.ac.in/noc17_mg24/preview
6. (NPTEL course by Dr. Nandan Sudarshanam& Dr. Balaraman Ravindran)
http://www.saedsayad.com/data_mining_map.htm
COURSE COORDINATOR HEAD OF THE DEPARTMENT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CO-PO-PSO MAPPING
Course Name:DATA WARE HOUSING AND Course Code:C303
MINING
Year/ Sem : III YEAR -I SEM Regulation: R20
Admitted Batch: 2020-21 Academic Year:2022-23
Course Coordinator :Mrs.K.S.Rupa
COURSE OUTCOMES
CO DESCRIPTION
C303.1 Summarize the basic concepts of data mining. K2
C303.2 Describe various data pre-processing procedures and their application scenariosK2
C303.3 Use Decision Trees to solve Classification problem. K3
C303.4 Illustrate the alternative classification techniques on data. K3
C303.5 Discuss Association analysis on Frequent item sets. K3
PROGRAM SPECIFIC OUTCOMES
PSO1 Graduates exhibit knowledge of basic sciences, skills in engineering specialization
like information security, cloud computing, networking, software engineering and
data analytics.
PSO 2 Graduates can adapt to evolving technologies for design and development of full
stack applications, exploring with optimal programming skills
CO PO PSO
Cos PO1 PO PO3 PO PO5 PO PO7 PO PO PO1 PO1 PO1 PSO1 PSO2
2 4 6 8 9 0 1 2
3 3 2 2 3 - - - - - 3 2 - -
C303.1
C303.2 3 2 2 3 3 - - - - - 3 2 - 3
3 2 2 2 3 - - - - - 3 2 - -
C303.3
C303.4 3 3 - 3 3 - - - - - 3 3 3 -
3 3 2 3 3 - - - - - 3 3 - 3
C303.5
AVG 3 2.6 2 2.6 3 - - - - - 3 2.4 3 3
COURSE COORDINATOR HEAD OF THE DEPARTMENT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
LECTURE PLAN
Course Name: DATA WAREHOUSING AND DATA MINING Course Code:C303
Year/ Sem: III B TECH II SEM Regulation: R20
Admitted Batch: 2020-21 Academic Year:2022-23
Number of Lectures per week: 05
Course Coordinator :Mrs. K.Santoshi Rupa
Course handled: Section A- Mrs. M.Srividya
Course handled: Section B - Mrs. K.Santoshi Rupa
Course handled: Section C - Mrs. M.Srividy
Lecture Plan :
UNIT - I
Data Warehousing and Online Analytical Processing: Data Warehouse: Basic concepts, Data Warehouse
Modeling: Data Cube and OLAP, Data Warehouse Design and Usage, Data Warehouse Implementation,
Introduction: Why and What is data mining, What kinds of data need to be mined and patterns can be
mined, Which technologies are used, Which kinds of applications are targeted.
Objective: To make the student understand the basic concepts and techniques of data warehousing
Sessio
Teaching
n Topics to be covered Reference
Aids
No.
1. Data Warehouse: Basic concepts T1:125-134 CB
2. Data Warehouse Modeling T1:135-136 CB/PPT
3. Data cube T1:136-137 CB
Stars, Snowflakes, and Fact Constellations: Schemas for
4. T1:13-141 PPT
Multidimensional Data Models
5. OLAP Operations T1:146 CB
6. Data Warehouse Design and Usage T1:150-155 CB
7. Data Warehouse Implementation T1:156-163 CB
8. OLAP Architectures T1:164-165 CB/PPT
9. Why and What is data mining T1:1-4 CB
10. Web mining T1:4-6 CB
11. KDD in Ml T1:6-8 CB
What kinds of data need to be mined and patterns can be
12. T1:8-15 CB
mined
13. What Kinds of Patterns Can Be Mined T1:15-21 CB
14. Which technologies are used T1:23-27 PPT
15. Which kinds of applications are targeted T1:27-28 CB
UNIT – II
Data Pre-processing: An Overview, Data Cleaning, Data Integration, Data Reduction, Data Transformation
and Data Discretization.
Objective: To understand types of the data to be mined and apply pre-processing methods on raw data.
Sessio
Teaching
n Topics to be covered Reference
Aids
No.
16. Data Pre-processing: An Overview T1:84-85 CB
17. Major Tasks in Data Preprocessing T1:85-87 CB
18. Data Cleaning T1:88-91 CB
19. Data Cleaning as a Process T1:91-92 CB
20. Data Integration T1:93-94 CB
21. Redundancy and Correlation Analysis T1:94-98 CB
Tuple Duplication T1:98-99
22. CB
Data Value Conflict Detection and Resolution
23. Data Reduction T1:99-101 CB/PPT
24. Principal Components Analysis T1:102-103 CB
Attribute Subset Selection
Regression and Log-Linear Models: T1:103-108
25. CB
Parametric Data Reduction
Histograms
Clustering
26. Sampling T1:108-110 CB
Data Cube Aggregation
27. Data Transformation and Data Discretization. T1:111-115 CB
Discretization by Histogram Analysis T1:115-116
28. CB
Discretization by Cluster, Decision Tree
Correlation Analyses T1:116-119
29. PPT
Concept Hierarchy Generation for Nominal Data
UNIT - III
Classification: Basic Concepts, General Approach to solving a classification problem, Decision Tree
Induction: Attribute Selection Measures, Tree Pruning, Scalability and Decision Tree Induction, Visual
Mining for Decision Tree Induction
Objective: To understand and analyze supervised and unsupervised models
Session
Topics to be covered Reference Teaching Aids
No.
T1:327
30. Classification: Basic Concepts CB
T2:193-195
31. General Approach to solving a classification problem T1:328-329 CB
32. Decision Tree Induction T1:330-335 PPT
33. Attribute Selection Measures T1:336-343 CB
34. Decision Tree Induction Example T1:336-343 CB
35. Tree Pruning T1:344-346 CB
36. Scalability and Decision Tree Induction T1:347-348 CB
Visual Mining for Decision Tree Induction T1:348-350 CB
37.
38. Rule-Based Classification T1:355-358 CB
39. Nearest Neighbor Classifiers T2:208-210 CB
40. * ID3 Algorithm R1:192-194 PPT
41. * Metrics for Evaluating Classifier Performance T1:364-372 CB/PPT
42. * Techniques to Improve Classification Accuracy T1:377-382 CB/PPT
UNIT – IV
Association Analysis: Problem Definition, Frequent Item set Generation, Rule Generation: Confident
Based Pruning, Rule Generation in Apriori Algorithm, Compact Representation of frequent item sets,
FPGrowth Algorithm.
Objective: To understand the issues and solutions of Discover interesting patterns.
Sessio
n Topics to be covered Reference Teaching Aids
No.
43. Association Analysis: Problem Definition T2:358-359 CB
44. Frequent Item set Generation T2:362-363 CB
45. The Apriori Principle T2:363-367 CB
Frequent Itemset Generation in the Apriori
Algorithm .
46. Candidate Generation and Pruning T2:368-372 CB
47. Support Counting T2:373-376 CB
48. Computational Complexity T2:377-379 CB
49. Rule Generation T2:380 PPT
50. Confident Based Pruning T2:380-381 CB
51. Rule Generation in Apriori Algorithm T2:381-382 CB
52. An Example: Congressional Voting Records T2:382-383 CB
53. Compact Representation of frequent item sets T2:384-386 CB
54. FPGrowth Algorithm T2:393-397 CB
UNIT - V
Cluster Analysis: Overview, Basics and Importance of Cluster Analysis, Clustering techniques, Different
Types of Clusters; K-means: The Basic K-means Algorithm, K-means Additional Issues, Bi-secting K
Means.
Objective: To understand various unsupervised models and estimate the accuracy of the algorithms.
Session
Topics to be covered Reference Teaching Aids
No.
55. Cluster Analysis: Overview T2:525-527 CB
56. Different Types of Clusterings T2:528-529 CB
57. Different Types of Clusters T2:529-530 CB
58. Different Types of Clusters T2:531-533 CB
59. K-means: The Basic K-means Algorithm T2:534-543 CB
60. K-means Additional Issues T2:544-546 CB
61. Bi-secting K Means. T2:547-548 PPT
62. K-means and Different Types of Clusters . T2:548-549 CB
63. Strengths and Weaknesses T2:549-553 CB
K-means as an Optimization Problem
64. Hierarchical Methods T1:457-467 CB
65. * Agglomerative Hierarchical Clustering T2:554-564 CB
66. * DBSCAN T2:565-569 CB
* Session duration: 50 mins *CB: CHALK & BOARD
TEXT BOOKS:
1. Data Mining concepts and Techniques, 3/e, Jiawei Han, Michel Kamber, Elsevier,2011.
2. Introduction to Data Mining: Pang-Ning Tan & Michael Steinbach, Vipin Kumar, Pearson,2012.
REFERENCE BOOKS:
1. Data Mining Techniques and Applications: An Introduction, Hongbo Du, Cengage Learning.
2. Data Mining: VikramPudi and P. Radha Krishna, Oxford Publisher.
3. Data Mining and Analysis - Fundamental Concepts and Algorithms; Mohammed J. Zaki, Wagner Meira,
Jr, Oxford
e-Resources:
1. http://onlinecourses.nptel.ac.in/noc17_mg24/preview
COURSE COORDINATOR HEAD OF THE DEPARTMENT