0% found this document useful (0 votes)

216 views50 pages

Week 1

This document provides an introduction and overview of the CS699 course on data mining. It discusses that the focus will be on data mining techniques rather than data warehousing. It outlines the topics that will be covered, including data preprocessing, basic data mining algorithms, evaluating models, and using software tools. Requirements and expectations are also summarized, such as prerequisites, math skills needed, and the software that will be used, including Weka, JMP Pro, and Oracle.

Uploaded by

t na

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

216 views50 pages

Week 1

Uploaded by

t na

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

CS699

Lecture 1
Introduction
CS699

• Our focus is “data mining” not “data warehousing.”
• Data mining is an important component of data analysis.
• Will discuss
– Data preprocessing
– Basic data mining algorithms
– How to evaluate data mining models and data mining results
– How to perform data mining using software tools

• A good data miningweb site: kdnuggets.com
• A good dataset site: UCI Machine Learning Repository
2
CS699

• Prerequisites:
– CS546 and either CS669 or CS579, or instructor’s consent.

• Math requirements
– Math is a tool to describe algorithms
– Mostly basic algebra (not linear algebra) and basic
probabilities and statistics
– A little bit of calculus
– You will have to do calculations using a calculator (which has a
“log” function)
3
CS699

• You will practice data mining with Weka, JMP Pro, and
Oracle.
• These software are used for assignments.
• Weka:
– Free
– Easy to learn and easy to use
– Has a large number of data mining algorithms
– You will use it immediately
– Also used for class project
4
CS699

• Oracle data mining: takes time to learn
• You will learn how to use them with assignments
• Oracle:
– Will use preconfigured virtual machine
– VM runs on Linux, but you don’t need to use Linux
– You will use SQL Developer for data mining

5
CS699

• JMP Pro: A statistical analysis software with some data
mining algorithms implemented on it
• Freely available from BU’s IT website (refer to
homework 1)
• You will use it for assignments

6
CS699

• Class project:
– Building and testing classifier models using a real‐world
dataset
– You will use primarily Weka
– You may use any other tools, including R, Python, or JMP Pro
for data preprocessing

7
CS699

• Each week
– Quiz (except in Week 6)
– Assignment
– Discussion

8
CS699

• Live Class
8: 00 – 10:00 PM EST, every Wednesday
• Live Class (and/or Q & A )
11 – 12 PM EST, every Saturday
• Attendance is not mandatory but students
must study the live class material.

9
Blackboard

• Under Class Discussion (Discussion Board)
– Announcement (Common Area)
– Live Classroom Slides
– Weka issues
– Oracle issues
– JMP Pro issues
– Around the Clock Help (other questions)

10
Why Data Mining?

• The Explosive Growth of Data: from terabytes to petabytes
– Data collection and data availability
• Automated data collection tools, database systems, Web,
computerized society
– Major sources of abundant data
• Business: Web, e‐commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube, social network
• We are drowning in data, but starving for knowledge!
• “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets

11
What Is Data Mining?

• Data mining (knowledge discovery from data)
– Extraction of interesting (non‐trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data
• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
• Watch out: Is everything “data mining”?
– Simple search and query processing
– (Deductive) expert systems

12
Knowledge Discovery (KDD) Process
• This is a view from typical database
systems and data warehousing
Pattern Evaluation
communities
• Data mining plays an essential role in
the knowledge discovery process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases 13
Data Mining in Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
14
A Typical View from ML and Statistics

Input Data Data Pre- Data Post-

Processing Mining Processing

Data integration Pattern discovery Pattern evaluation

Normalization Association & correlation Pattern selection
Feature selection Classification Pattern interpretation
Clustering
Dimension reduction Pattern visualization
Outlier analysis
…………

• This is a view from typical machine learning and statistics communities

15
What Kinds of Data?

• Database‐oriented data sets and applications
– Relational database, data warehouse, transactional database
• Advanced data sets and advanced applications
– Data streams and sensor data
– Time‐series data, temporal data, sequence data (incl. bio‐sequences)
– Structure data, graphs, social networks and multi‐linked data
– Object‐relational databases
– Heterogeneous databases and legacy databases
– Spatial data and spatiotemporal data
– Multimedia database
– Text databases
– The World‐Wide Web

16
Data Types

• Categorical (or nominal) vs. numeric data:
Categorical Numeric
OID Age Income Buy? OID Age Height Weight
1 Young Low Y 1 15 60 180
2 Young High Y 2 8 48 115
3 Old Low N 3 32 72 153
4 Middle Low Y 4 27 65 145
5 Middle High N 5 17 58 189
6 Old Low N 6 56 70 150
7 Young High N 7 72 56 163
8 Old High Y 8 22 63 172
9 Old High Y 9 42 71 139
10 Young Low N 10 39 68 150

17
Classification

• Classification and label prediction
– Construct models (functions) based on some training examples, called
training dataset.
– Describe and distinguish classes or concepts for future prediction
• E.g., classify countries based on (climate), or classify cars based on (gas
mileage)
– Predict some unknown class label (or class attribute)
• Typical methods
– Decision trees, naïve Bayesian classification, support vector machines, neural
networks, rule‐based classification, pattern‐based classification, logistic
regression, …
• Typical applications:
– Credit card fraud detection, direct marketing, classifying stars, diseases,
web‐pages, …
• Also called supervised learning
18
Classification

• Example (decision tree)

Classify a car with unknown class
label (risk):

4‐door, 4‐cylinder, wagon.
==> risk = 1

19
Classification

• Music CD purchase dataset example
• A synthetic dataset, where 1’s and 0’s were entered
arbitrarily.
• Contains information about customers’ purchase of
music CD’s collected over a certain period of time, say
past 12 months.
• 1 in the dataset indicates the customer purchased a
CD of the musician at least once in the past 12
months.
• The class attribute indicates whether a customer is
“young” or “old.”
20
Classification

• 12 attributes:1 ID, 10 predictor (independent) attributes, 1
class (dependent) attribute
50 tuples
• A part of the dataset

21
Classification

• Decision tree generated by J48 algorithm

22
Classification vs. Numeric Prediction

• Classification:
– Predicted (dependent) attribute is a nominal attribute.
– Example: Predict whether a customer will buy a computer or not (yes or
no, for example).

• Numeric prediction:
– Predicted (dependent) attribute is a numeric attribute.
– Example: Predict the weight (numeric value) of a person given the age and
the height of the person.
– Example: CPU dataset

23
Association and Correlation Analysis

• Frequent patterns (or frequent itemsets)
– What items are frequently purchased together in a grocery
store?
– Mine all frequent itemsets and then all strong rules.
– An itemset is frequent,
if its support is >= predefined threshold, minimum support
– A rule is written as: <left hand side> => <right hand side>
– Example of a rule: {milk, butter} => {cheese, egg}
– A rule is strong,
if its confidence is >= predefined threshold, minimum
confidence
24
Association and Correlation Analysis

• Example:

Support Examples:

support of {bread} = 7
support of {egg, milk} = 4
support of {bread, egg, milk}
A rule R: {bread}  {egg, milk} =3

Quality measures of the rule and informal interpretation:

Support(R) = 33.3% (or 3/9) /* fraction of people who purchased {bread, milk, egg}
Confidence(R) = 42.9% (or 3/7) /* among those who purchased bread, fraction of
/* people who also purchased {milk, egg}
25
Association and Correlation Analysis

• Music CD purchase dataset (preprocessed for Weka’s Apriori
algorithm)
• A part of the dataset

26
Association and Correlation Analysis
• Some association rules (mined by Apriori algorithm)

• Handel=t Mahler=t 5 ==> Bach=t 5    <conf:(1)>
• Bach=t Haydn=t Mendelssohn=t 5 ==> Mozart=t 5    <conf:(1)>
• Bach=t Haydn=t Mozart=t 5 ==> Mendelssohn=t 5    <conf:(1)>
• Bach=t Mendelssohn=t 7 ==> Mozart=t 6    <conf:(0.86)>
• Bach=t Handel=t 6 ==> Mahler=t 5    <conf:(0.83)>
• Bach=t Mozart=t Mendelssohn=t 6 ==> Haydn=t 5 <conf:(0.83)>
• Haydn=t Mendelssohn=t 9 ==> Mozart=t 7    conf:(0.78)
• Haydn=t Mozart=t 9 ==> Mendelssohn=t 7    <conf:(0.78)>
• Bach=t Mozart=t 8 ==> Mendelssohn=t 6    <conf:(0.75)>
• Mahler=t 14 ==> Bach=t 10    <conf:(0.71)>

27
Association and Correlation Analysis

• Association, correlation vs. causality
– Are strongly associated items also strongly correlated?
– If two items are strongly correlated, is there a causal
relationship?
• How to mine such patterns and rules efficiently in large
datasets?
• Association rules can also be used for classification or
clustering.

28
Cluster Analysis

• Unsupervised learning (i.e., there is no class label)
• Group data to form new categories (i.e., clusters), e.g., cluster
customers into different groups
• Principle: Maximizing intra‐class similarity & minimizing
interclass similarity
• Many methods and applications

29
Cluster Analysis

• Clustering output types:

30
Cluster Analysis

• London cholera epidemic (Source: J. Leskovec, A. Rajaraman,
and J.D. Ullman, “Mining of Massive Datasets,” 2014, page 3.)

31
Cluster Analysis

• Iris dataset (from UCI ML Repository)
• Used for classification
• Has 4 attributes and class attribute
• Class attribute: type of iris plant
• A part of the dataset

32
Cluster Analysis

• A clustering algorithm was run on only two attributes
• Clustering result visualization

• X: petallength, Y: petalwidth
33
Outlier Analysis

– Outlier: A data object that does not comply with the general
behavior of the data
– Noise or exception? ― One person’s garbage could be
another person’s treasure
– Methods: byproduct of clustering or regression analysis, …
– Useful in fraud detection, rare events analysis

34
Sequential Pattern, Trend and Evolution Analysis

– Trend, time‐series, and deviation analysis: e.g., regression
and value prediction
– Sequential pattern mining
• e.g., first buy digital camera, then buy large SD memory
cards
– Periodicity analysis
– Biological sequence analysis

35
Evaluation of Knowledge
• Are all mined knowledge interesting?
– One can mine tremendous amount of “patterns” and knowledge
– Some may fit only certain dimension space (time, location, …)
– Some may not be representative, may be transient, …
• A pattern is interesting if
– easily understood
– valid on new data or test data with some degree of certainty
– potentially useful
– novel
• Objective measures (e.g., support and confidence of an
association rule)
• Subjective measures (e.g., expected/unexpected, actionable)
36
Technologies Used in Data Mining

Machine Pattern Statistics

Learning Recognition

Applications Data Mining Visualization

Algorithm Database High‐Performance

Technology Computing

37
Applications of Data Mining
• Web page analysis: from web page classification, clustering to PageRank &
HITS algorithms
• Collaborative analysis & recommender systems
• Basket data analysis to targeted marketing
• Biological and medical data analysis: classification, cluster analysis
(microarray data analysis), biological sequence analysis, biological network
analysis
• Data mining and software engineering
• From major dedicated data mining systems/tools (e.g., SAS, MS SQL‐Server
Analysis Manager, Oracle Data Mining Tools) to invisible data mining

38
Major Issues in Data Mining

• Mining Methodology
• User Interaction
• Efficiency and Scalability
• Diversity of data types
• Data mining and society

39
What is a Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained separately from the
organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• “A data warehouse is a subject‐oriented, integrated, time‐variant, and
nonvolatile collection of data in support of management’s decision‐making
process.”—W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses

40
Data Warehouse—Subject‐Oriented
• Organized around major subjects, such as customer, product,
sales
• Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing
• Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process

41
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on‐line transaction records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.

42
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly longer
than that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical
perspective (e.g., past 5‐10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain
“time element”

43
Data Warehouse—Nonvolatile
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data

44
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

45
Data Warehouse: A Three-Tier Architecture

Monitor
& OLAP Server
Other Metadata
sources Integrator
Analysis
Query
Operational Extract
Serve Reports
DBs Transform Data
Data mining
Load Warehouse
Refresh

Data Marts

Data Sources Bottom tier Middle tier Top tier

Data warehouse server OLAP Engine Front-End Tools46
Three Data Warehouse Models
• Enterprise warehouse
– collects all of the information about subjects spanning the
entire organization
• Data Mart
– a subset of corporate‐wide data that is of value to a specific
groups of users. Its scope is confined to specific, selected
groups, such as marketing data mart
• Independent vs. dependent (directly from warehouse) data mart
• Virtual warehouse
– A set of views over operational databases
– Only some of the possible summary views may be
materialized
47
Extraction, Transformation, and Loading (ETL)
• Data extraction
– get data from multiple, heterogeneous, and external sources
• Data cleaning
– detect errors in the data and rectify them when possible
• Data transformation
– convert data from legacy or host format to warehouse format
• Load
– sort, summarize, consolidate, compute views, check integrity,
and build indicies and partitions
• Refresh
– propagate the updates from the data sources to the
warehouse

48
Metadata Repository
• Meta data is the data defining warehouse objects. It stores:
• Description of the structure of the data warehouse
– schema, view, dimensions, hierarchies, derived data defn, data mart
locations and contents
• Operational meta‐data
– data lineage (history of migrated data and transformation path), currency
of data (active, archived, or purged), monitoring information (warehouse
usage statistics, error reports, audit trails)
• The algorithms used for summarization
• The mapping from operational environment to the data warehouse
• Data related to system performance
– warehouse schema, view and derived data definitions
• Business data
– business terms and definitions, ownership of data, charging policies

49
References
• Han, J., Kamber, M., Pei, J., “Data mining: concepts and
techniques,” 3rd Ed., Morgan Kaufmann, 2012
• http://www.cs.illinois.edu/~hanj/bk3/

Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
No ratings yet
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
3 pages
Administration OPEN DATABASE 2024
No ratings yet
Administration OPEN DATABASE 2024
66 pages
Mathematical Treatise On Linear Algebra
No ratings yet
Mathematical Treatise On Linear Algebra
7 pages
Iterative Methods for Grad Students
No ratings yet
Iterative Methods for Grad Students
33 pages
Beginners' Guide to Object Recognition
No ratings yet
Beginners' Guide to Object Recognition
9 pages
Course Notes Math 146
No ratings yet
Course Notes Math 146
10 pages
R Cheat Sheet 3 PDF
No ratings yet
R Cheat Sheet 3 PDF
2 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
Metric Space Measure Theory
No ratings yet
Metric Space Measure Theory
17 pages
04 Notes 6250 f13
0% (1)
04 Notes 6250 f13
16 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
R Vectors and Lists Guide
No ratings yet
R Vectors and Lists Guide
12 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
Possible Interview Questions
No ratings yet
Possible Interview Questions
3 pages
ML - Lab Manual (BAI702) - Updated 2-7-2025
100% (1)
ML - Lab Manual (BAI702) - Updated 2-7-2025
32 pages
Comparative
No ratings yet
Comparative
5 pages
Ad3351 Daa Question Bank
No ratings yet
Ad3351 Daa Question Bank
12 pages
FMEA
No ratings yet
FMEA
5 pages
Quality and Reliability Engineering
No ratings yet
Quality and Reliability Engineering
15 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Information Theory: 1 Random Variables and Probabilities X
No ratings yet
Information Theory: 1 Random Variables and Probabilities X
8 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
18 pages
Badyal, 2003 - Animal Models of Hypertension and Effect of Drugs
100% (2)
Badyal, 2003 - Animal Models of Hypertension and Effect of Drugs
14 pages
Project Management NOtes For Makaut
No ratings yet
Project Management NOtes For Makaut
65 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
100% (1)
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
33 pages
Machine Learning Techniques Kcs 055
No ratings yet
Machine Learning Techniques Kcs 055
2 pages
Neural Networks for Advanced Learners
No ratings yet
Neural Networks for Advanced Learners
23 pages
Lecture 8 - Random Bit Generators
No ratings yet
Lecture 8 - Random Bit Generators
34 pages
User Defined Functions in Javascript
No ratings yet
User Defined Functions in Javascript
6 pages
ML Notes Question Bank Exstraction From Notes
No ratings yet
ML Notes Question Bank Exstraction From Notes
163 pages
Short Qns CNN
No ratings yet
Short Qns CNN
11 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Albendazole Impurities Analysis
No ratings yet
Albendazole Impurities Analysis
9 pages
Eng PDF
No ratings yet
Eng PDF
166 pages
A Gentle Introduction To Backpropagation
100% (2)
A Gentle Introduction To Backpropagation
15 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
FMEA Guide for Engineering Students
No ratings yet
FMEA Guide for Engineering Students
16 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Using SimBiology For Pharmacokinetic and Mechanistic Modeling
No ratings yet
Using SimBiology For Pharmacokinetic and Mechanistic Modeling
72 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
Numpy - Tutorial - Ipynb - Colaboratory
9 pages
CS8082 Machine Learning Exam Prep
No ratings yet
CS8082 Machine Learning Exam Prep
5 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Autoencoders & Keras Overview
No ratings yet
Autoencoders & Keras Overview
42 pages
DSE 3155 27 Sep 2023
No ratings yet
DSE 3155 27 Sep 2023
14 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
33 pages
Predictive Analytics
No ratings yet
Predictive Analytics
46 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
Unit I 1
No ratings yet
Unit I 1
203 pages
Unit 1 - DA - Introduction To Data Science
No ratings yet
Unit 1 - DA - Introduction To Data Science
70 pages
16CS63: Machine Learning
No ratings yet
16CS63: Machine Learning
93 pages
Information Technology Fundamentals: CCIT4085
No ratings yet
Information Technology Fundamentals: CCIT4085
43 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Comp 6838
No ratings yet
Comp 6838
41 pages
DA Full
No ratings yet
DA Full
738 pages
OCR Technology Overview & Tools
No ratings yet
OCR Technology Overview & Tools
7 pages
Final Project of MIS
No ratings yet
Final Project of MIS
12 pages
I Focus: Echnology For CSEC
No ratings yet
I Focus: Echnology For CSEC
4 pages
Data Mining-Applications, Issues
No ratings yet
Data Mining-Applications, Issues
9 pages
Document From Rutik - Devkate (Hall Ticket Management)
No ratings yet
Document From Rutik - Devkate (Hall Ticket Management)
93 pages
Unit - 1 - Intro To DF
No ratings yet
Unit - 1 - Intro To DF
11 pages
Library Data Conversion Guide
No ratings yet
Library Data Conversion Guide
10 pages
Racsigb
No ratings yet
Racsigb
39 pages
B.Tech Cyber Security Course
No ratings yet
B.Tech Cyber Security Course
2 pages
Database Design for Students
No ratings yet
Database Design for Students
4 pages
SQL Assignments Select Functions
0% (1)
SQL Assignments Select Functions
3 pages
Product Sales Assignment Solutions
No ratings yet
Product Sales Assignment Solutions
15 pages
3-2 Storage Data Protection Technologies and Applications
No ratings yet
3-2 Storage Data Protection Technologies and Applications
53 pages
Tech Max Book List
No ratings yet
Tech Max Book List
22 pages
Object Oriented Analysis and Design Project Report: Complaint Management System in Colleges
No ratings yet
Object Oriented Analysis and Design Project Report: Complaint Management System in Colleges
15 pages
What Are Ways To Avoid Plagiarism in Research - 1 - 26
100% (1)
What Are Ways To Avoid Plagiarism in Research - 1 - 26
4 pages
SmartOffice Assistant
No ratings yet
SmartOffice Assistant
4 pages
Proposal 1
No ratings yet
Proposal 1
6 pages
Recommender System Syllabus
100% (1)
Recommender System Syllabus
3 pages
Buildings 12 00571
No ratings yet
Buildings 12 00571
25 pages
Webology - 2022 (Verdi Yasin) Hierarchical of Grid Partition (HGP) For Measuring The Similarity of Data in Optimizing Data Accuracy
No ratings yet
Webology - 2022 (Verdi Yasin) Hierarchical of Grid Partition (HGP) For Measuring The Similarity of Data in Optimizing Data Accuracy
20 pages
Mini Project
No ratings yet
Mini Project
1 page
Spectrum Protect Data Sheet
No ratings yet
Spectrum Protect Data Sheet
8 pages
Data Management Platform Functionality On Revive Adserver
No ratings yet
Data Management Platform Functionality On Revive Adserver
6 pages
A673181af2309e Varun Singh Design Resume
No ratings yet
A673181af2309e Varun Singh Design Resume
1 page
DBMS Guide for Students & Banks
No ratings yet
DBMS Guide for Students & Banks
35 pages
Introduction To Databases
No ratings yet
Introduction To Databases
12 pages
ETL Basics and Testing Guide
No ratings yet
ETL Basics and Testing Guide
19 pages
Library Resources & Services Guide
No ratings yet
Library Resources & Services Guide
7 pages
New EAM Functionality in IFS Apps 10
0% (1)
New EAM Functionality in IFS Apps 10
2 pages

Week 1

Uploaded by

Week 1

Uploaded by

CS699

Data Warehouse Selection

Data Presentation Business

Data Preprocessing/Integration, Data Warehouses

Input Data Data Pre- Data Post-

Data integration Pattern discovery Pattern evaluation

Quality measures of the rule and informal interpretation:

Machine Pattern Statistics

Applications Data Mining Visualization

Algorithm Database High‐Performance

Data Sources Bottom tier Middle tier Top tier

You might also like