GUJARAT TECHNOLOGICAL UNIVERSITY
Bachelor of Engineering
Subject Code: 3160714
DATA MINING
6th SEMESTER
Type of course: Under graduate (Elective)
Prerequisite: NA
Rationale: NA
Teaching and Examination Scheme:
Teaching Scheme Credits Examination Marks Total
L T P C Theory Marks Practical Marks Marks
ESE (E) PA (M) ESE (V) PA (I)
3 0 2 4 70 30 30 20 150
Content:
Sr. Content Total %
No. Hrs. Weightage
1 Introduction to data mining (DM): 3 10
Motivation for Data Mining - Data Mining-Definition and Functionalities –
Classification of DM Systems - DM task primitives - Integration of a Data
Mining system with a Database or a Data Warehouse - Issues in DM – KDD
Process
2 Data Pre-processing: 4 15
Data summarization, data cleaning, data integration and transformation, data
reduction, data discretization and concept hierarchy generation, feature
extraction , feature transformation, feature selection, introduction to
Dimensionality Reduction, CUR decomposition
3 Concept Description, Mining Frequent Patterns, Associations and 10 20
Correlations:
What is concept description? - Data Generalization and summarization-based
characterization - Attribute relevance - class comparisons, Basic concept,
efficient and scalable frequent item-set mining methods, mining various kind
of association rules, from association mining to correlation analysis,
Advanced Association Rule Techniques, Measuring the Quality of Rules.
4 Classification and Prediction: 10 20
Classification vs. prediction, Issues regarding classification and prediction,
Statistical-Based Algorithms, Distance-Based Algorithms, Decision Tree-
Based Algorithms, Neural Network-Based Algorithms, Rule-Based
Algorithms, Combining Techniques, accuracy and error measures, evaluation
of the accuracy of a classifier or predictor. Neural Network Prediction
methods: Linear and nonlinear regression, Logistic Regression Introduction of
tools such as DB Miner / WEKA / DTREG DM Tools
5 Cluster Analysis: 10 20
Clustering: Problem Definition, Clustering Overview, Evaluation of
Clustering Algorithms, Partitioning Clustering -K-Means Algorithm, K-
Means Additional issues, PAM Algorithm; Hierarchical Clustering –
Agglomerative Methods and divisive methods, Basic Agglomerative
Page 1 of 2
w.e.f. AY 2018-19
GUJARAT TECHNOLOGICAL UNIVERSITY
Bachelor of Engineering
Subject Code: 3160714
Hierarchical Clustering, Strengths and Weakness; Outlier Detection,
Clustering high dimensional data, clustering Graph and Network data.
6 Web mining and other data mining: 5 15
Web Mining: Introduction to Web Mining, Web content mining, Web usage
mining, Web Structure mining, Web log structure and issues regarding web
logs, Spatial Data Mining, Temporal Mining, And Multimedia Mining.
Applications of Distributed and parallel Data Mining.
Suggested Specification table with Marks (Theory):
Distribution of Theory Marks
R Level U Level A Level N Level E Level C Level
10 20 15 15 5 5
Legends: R: Remembrance; U: Understanding; A: Application, N: Analyze and E: Evaluate C:
Create and above Levels (Revised Bloom’s Taxonomy)
Note: This specification table shall be treated as a general guideline for students and teachers. The
actual distribution of marks in the question paper may vary slightly from above table.
Reference Books:
1. J. Han, M. Kamber, “Data Mining Concepts and Techniques”, Morgan Kaufmann
2. M. Kantardzic, “Data mining: Concepts, models, methods and algorithms, John Wiley &Sons Inc.
3. M. Dunham, “Data Mining: Introductory and Advanced Topics”, Pearson Education.
4. Ning Tan, Vipin Kumar, Michael Steinbanch Pang, “Introduction to Data Mining”, Pearson
Education
Course Outcome: After learning the course the students will be able
Marks %
Sr. No. CO statement
weightage
CO-1 Perform the preprocessing of data and apply mining techniques on it. 20
Identify the association rules, classification, and clusters in large data 30
CO-2
sets.
Solve real world problems in business and scientific information using 20
CO-3
data mining.
CO-4 Use data analysis tools for scientific applications. 15
CO-5 Implement various supervised machine learning algorithms. 15
List of Experiments:
Laboratory work will be based on the above syllabus with minimum 10 experiments to be incorporated.
Page 2 of 2
w.e.f. AY 2018-19