0% found this document useful (0 votes)
64 views41 pages

Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer

The document provides an overview of WEKA, an open-source machine learning toolkit. WEKA contains tools for data pre-processing, classification, regression, clustering, association rule mining, and visualization. It can import data from files or databases, apply filters to pre-process data, and use various machine learning algorithms to build classifiers or find patterns in data. WEKA also provides graphical user interfaces for easy use of its functionalities.

Uploaded by

ThoToet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views41 pages

Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer

The document provides an overview of WEKA, an open-source machine learning toolkit. WEKA contains tools for data pre-processing, classification, regression, clustering, association rule mining, and visualization. It can import data from files or databases, apply filters to pre-process data, and use various machine learning algorithms to build classifiers or find patterns in data. WEKA also provides graphical user interfaces for easy use of its functionalities.

Uploaded by

ThoToet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Overview:

Data Mining Methods

WEKA Tutorial

§ WEKA: A Machine Learning Toolkit


§ The Explorer
§ Classification and Regression
§ Clustering
§ Association Rules
§ Attribute Selection
§ Data Visualization
§ The Experimenter
§ The Knowledge Flow GUI
§ Conclusions

4
WEKA - Introduction

§ Machine learning/data mining software written in


Java (distributed under the GNU Public License)
§ Used for research, education, and applications

§ Main features:
§ Comprehensive set of data pre-processing tools, learning algorithms
and evaluation methods
§ Graphical user interfaces (incl. data visualization)
§ Environment for comparing learning algorithms

Pre-processing the data

§ Data can be imported from a file in various formats:


ARFF, CSV, C4.5, binary
§ Data can also be read from a URL or from an SQL
database (using JDBC)
§ Pre-processing tools in WEKA are called “filters”
§ WEKA contains filters for:
§ Discretization, normalization, resampling, attribute selection,
transforming and combining attributes, …

6
WEKA with “flat” files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present

WEKA with “flat” files


@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present

8
9

10
11

12
13

14
15

16
17

18
19

20
21

22
23

24
25

26
27

28
29

30
Building “Classifiers”

§ Classifiers in WEKA are models for predicting nominal


or numeric quantities
§ Implemented learning schemes include:
§ Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …

§ “Meta”-classifiers include:
§ Bagging, boosting, stacking, error-correcting output codes, locally
weighted learning, …

31

32
33

34
35

36
37

38
39

40
41

42
43

44
45

46
47

48
49

50
51

52
53

54
55

56
57

58
59

60
61

62
63

Clustering

§ WEKA contains many clustering implementations:


§ Works with both discrete and numerical data

§ Example of K-means

64
65

66
67

68
69

70
Finding Associations

§ WEKA contains an implementation of the Apriori


algorithm for learning association rules
§ Works only with discrete data
§ Can identify statistical dependencies between groups
of attributes:
§ milk, butter -> bread, eggs (with confidence 0.9 and
support 2000)
§ Apriori can compute all rules that have a given
minimum support and exceed a given confidence

71

72
73

74
75

76
77

78
Data visualization

§ Visualization very useful in practice:


§ e.g. helps to determine difficulty of the learning problem

§ WEKA can visualize single attributes and pairs of


attributes
§ To do: rotating 3-d visualizations (Xgobi-style)

§ Color-coded class values


§ “Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
§ “Zoom-in” function
79

80
81

82
83

You might also like