0% found this document useful (0 votes)
53 views25 pages

Unit-1 2

Uploaded by

Sohail Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views25 pages

Unit-1 2

Uploaded by

Sohail Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Mining

Dr R.Singh, MJP R.U., Bly


WHAT IS
DATAMINING ?

p One of many definitions:

p "Data mining is the science of extracting useful


knowledge from huge data repositories."

p ACM SIGKDD, Data Mining Curriculum: A Proposal

Dr R.Singh, MJPRU, India 2


What is Data Mining?
Many Definitions

n Non-trivial extraction of implicit, previously unknown


and potentially useful information from data

n Data mining is the process of discovering meaningful new


correlations, patterns and trends by sifting through large
amounts of data stored in repositories, using pattern recognition
technologies as well as statistical and mathematical techniques
Datamining
Datamining is the exploration and analysis
of large quantities of data in order to
discover valid, novel, potentially useful,
and ultimately understandable patterns in
data.
n valid: hold on new data with some certainity
n novel: non-obvious to the system
n useful: should be possible to act on the item
n understandable: humans should be able to
interpret the pattern

Dr R.Singh, CS&IT, MJP R.U., Bly


Origins of
Data Mining

§ Draws ideas from AI,


machine learning,
patternrecognition,
statistics, anddatabase
systems

§ There are differences in


terms of
—used data and
—the goals.
5
Large-scale Data is Everywhere!

§ There has been enormous E-Commerce


Cyber Security
data growth in both
commercial and scientific
databases due to advances
in data generation and
collection technologies
§ New mantra
Traffic Patterns Social Networking: Twitter
§ Gather whatever data you
can whenever and
wherever possible.
§ Expectations
§ Gathered data will have
value either for the
purpose collected or for a Sensor Networks Computational Simulations
purpose not envisioned. 6
Why Data Mining? Commercial Viewpoint

p Lots of data is being collected and warehoused


n Web data
p Google has Peta Bytes of web data
p Facebook has billions of active users
n purchases at department/grocery stores, e-commerce
p Walmart: 20 million transactions/day, 10 terabyte database,
Blockbuster: 36 million households
p Amazon handles millions of visits/day
n Bank/Credit Card transactions
p Computers have become cheaper and more
powerful
p Competitive Pressure is Strong
n Provide better, customized services for an edge (e.g. in 7
Customer Relationship Management)
Why Data Mining? Scientific Viewpoint

p Data collected and stored at enormous speeds


n remote sensors on a satellite
p NASA EOSDIS archives over Sky Survey Data
petabytes of earth science data / year
n telescopes scanning the skies
p Sky survey data
n High-throughput biological data
n Automatic Data Capture of Transactions
p e.g. Bar Codes , POS devices, Mouse clicks, fMRI Data from Brain
Location data (GPS, cell phones)

n scientific simulations
p terabytes of data generated in a few hours
p Data mining helps scientists
n in automated analysis of massive datasets
Surface Temperature of 8Earth
n In hypothesis formation
Properties of Datamining

p Key Properties of Data Mining


n Automatic discovery of patterns
n Creation of likely outcomes
n Creation of actionable information
n Focus on large datasets and databases

Dr R.Singh, MJPRU, India 9


Problems Suitable for Data-Mining

§ require knowledge-based decisions


§ have a changing environment
§ have sub-optimal current methods
§ have accessible, sufficient, and relevant data
§ provides high payoff for the right decisions!

Privacy considerations important if personal data


is involved
Dr R.Singh, MJPRU, India 10
Challenges in Data Mining

Dr R.Singh, MJP R.U., Bly


Other Datamining Tasks

Text mining –
document Graph mining – Data stream
clustering, topic social networks mining/real
models timedata mining

Mining
spatiotemporal
Visual data mining Distributed data
data (e.g., moving
mining
objects)

Dr R.Singh, MJPRU, India 12


Architecture of Datamining

Dr R.Singh, MJP R.U., Bly


Architecture of Datamining
o Data Warehouse/ Database server:
o This module is used to collect the interested or relevant information from
different sources. It coordinate with user interface module to allow the user for
browsing database and data warehouse schemas or data structures, evaluate mined
patterns, and visualize the patterns in different form

Dr R.Singh, MJP R.U., Bly


Architecture of Datamining

o Knowledge Base:
o This is the domain knowledge that is used to guide the search
or evaluate the interestingness of resulting patterns. Such
knowledge can include concept hierarchies, used to organize
attributes or attributeDrvalues into different levels of abstraction.
R.Singh, MJP R.U., Bly
o Data Mining Engine:
o This is essential to the data mining system and ideally
Architecture
consists of a setof Datamining
of functional modules for tasks such
as characterization, association and correlation
analysis, classification, prediction, cluster analysis,
outlier analysis, and evolution analysis.

Dr R.Singh, MJP R.U., Bly


Architecture of Datamining

o Pattern Evaluation Module:


o This Module focuses the search toward interesting patterns. It may use
interestingness thresholds to filter-out discovered patterns.

Dr R.Singh, MJP R.U., Bly


Architecture of Datamining

o User interface:
o This module communicates between users and the data mining system,
allowing the user to interact with the system by specifying a data mining query
or task, providing information to help focus the search, and performing exploratory
datamining based on the intermediate data mining results.

Dr R.Singh, MJP R.U., Bly


Phases of Data Mining: Processing

Dr R.Singh, MJP R.U., Bly


Data Mining Process
Process
1. Develop understanding of application, goals
2. Create dataset for study (often from Data
Warehouse)
3. Data Cleaning and Preprocessing
4. Data Reduction and projection
Data
5. Choose Data Mining task Mining
6. Choose Data Mining algorithms
7. Use algorithms to perform task
8. Interpret and iterate thru 1-7 if necessary
9. Deploy: integrate into operational systems.
Dr R.Singh, CS&IT, MJP R.U., Bly
Data Mining Process
Process
1. Develop understanding of application, goals
2. the
Ø State Create dataset
problem andfor study (often
formulate from Data
the hypothesis
Warehouse)
ü In this step, the goals of the businesses are set and the
3.important factors that
Data Cleaning andwill help in achieving the goal are
Preprocessing
4.discovered.
Data Reduction and projection
ü State the meaningful objectives and requirements to Data get the
5. Choose Data Mining task Mining
domain-specific knowledge and experiences for specific
6.business
Choose Data Mining algorithms
problem.
ü7.Multiple
Use algorithms
hypotheses to canperform task for the single problem
be formulated
8.so atInterpret
this stageand
formulate generalized
iterate thru hypothesis.
1-7 if necessary
9. Deploy: integrate into operational systems.
Dr R.Singh, CS&IT, MJP R.U., Bly
Data Mining Process
Process
1. Develop understanding of application, goals
2. Create dataset for study (often from Data
Ø CollectWarehouse)
the data:
ü3.ThisData
step Cleaning
will collectand
thePreprocessing
whole data and populate the data from
4.different
Data sources.
Reduction and projection
ü5.Formulate
Choosethe rules
Data to acquire
Mining taskthe data from differentMining
location.
Data
ü At this stage perform statistical analysis of the data which help
6. Choose Data Mining algorithms
to estimate the accuracy of a model at later stage.
ü7.DataUse algorithms
is visualized and toqueried
performto task
check its completeness.
ü8.Manage
Interpret and to
it further iterate thru 1-7
get handy if necessary
results
9. Deploy: integrate into operational systems.
Dr R.Singh, CS&IT, MJP R.U., Bly
Data Mining Process
Process
1. Develop understanding of application, goals
2. Create dataset for study (often from Data
Warehouse)
3. Data Cleaning and Preprocessing
Ø Preprocessing the data and projection
4. Data Reduction
ü5.DataChoose
preprocessing includes Data
several steps such as variable
Data Mining task Mining
6.scaling, different
Choose Datatypes
Miningof encoding
algorithmsand selecting the important
features.
ü7.DataUse algorithms
preparation to has
stage perform task
4 major steps which include data
8.purification,
Interpretdata
and integration,
iterate thrudata
1-7 selection,
if necessary
and data
9.transformation.
Deploy: integrate into operational systems.
Dr R.Singh, CS&IT, MJP R.U., Bly
Data Mining Process
Process
Ø Apply Data Mining Techniques: Estimate the model
ü1.ThisDevelop
process isunderstanding of application,
not straightforward; usually, ingoals
practice, the
2.selecting
Createthedataset forisstudy
best one (often from
an additional trivialData
task.
ü TheseWarehouse)
are pattern evaluation, knowledge representation and a
3.conclusion retrainedand
Data Cleaning from all these stages.
Preprocessing
4. Data Reduction and projection
Data
5. Choose Data Mining task Mining
6. Choose Data Mining algorithms
7. Use algorithms to perform task
8. Interpret and iterate thru 1-7 if necessary
9. Deploy: integrate into operational systems.
Dr R.Singh, CS&IT, MJP R.U., Bly
Data Mining Process
Process
1. Develop understanding of application, goals
Ø Interpret
2. Createthe model
datasetand
fordraw
studyconclusions
(often from Data
ü TheWarehouse)
Datamining model should be understandable, interpretable
3.andData
highly accurate.and Preprocessing
Cleaning
ü The matter of interpreting these models, also vital, is taken into
4.account
DataaReduction and projection
separate task, with specific techniques to validate
Data
5.results.
Choose Data Mining task Mining
6. Choose Data Mining algorithms
7. Use algorithms to perform task
8. Interpret and iterate thru 1-7 if necessary
9. Deploy: integrate into operational systems.
Dr R.Singh, CS&IT, MJP R.U., Bly

You might also like