0% found this document useful (0 votes)
3K views7 pages

Outlier Detection - Weka - IQR

This document provides instructions for using the WEKA Explorer tool to preprocess a dataset through outlier detection and removal. It describes using the InterquartileRange filter to add attributes flagging outlier and extreme values, then using the RemoveWithValues filter to remove instances containing outliers according to those flag attributes. The goal is to clean the dataset before further analysis by detecting and removing outlier instances.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views7 pages

Outlier Detection - Weka - IQR

This document provides instructions for using the WEKA Explorer tool to preprocess a dataset through outlier detection and removal. It describes using the InterquartileRange filter to add attributes flagging outlier and extreme values, then using the RemoveWithValues filter to remove instances containing outliers according to those flag attributes. The goal is to clean the dataset before further analysis by detecting and removing outlier instances.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab Exercise One

Data Preprocessing with WEKA Explorer

Using Filters to handle outliers and extreme values


Unsupervised Attribute Filter – InterquartileRange: This filter adds new attributes
that indicate whether the values of instances can be considered outliers or extreme
values.

1. Open the dataset – small_telco_labOne. Perform the replacing missing values


step with the filter – ReplaceMissingValues. Please pay attention that there are
total 22 attributes in the dataset.

2. Then Click Choose button under Filter. Click Filter button at the bottom of the
drop-down window.

3. A window called Filtering Capabilities opens. This window shows what kind of
attributes that filters support. Make sure that only Numeric Attributes and
Numeric Class are checked. Click OK.
4. Choose InterquartileRange filter from the drop down list of unsupervised
attribute filter list.

5. Left-click the box of the filter, the properties window shows. Click More button
to show more information about this filter. The factors are used to define extreme
values and outlier.
6. Click Apply button at the end of the filter box. You will find two extra attributes
are generated. These two attributes flag an instance as an outlier or extreme if any
of its attribute values are deemed outliers or extreme.
7. If we change the option for InterquartileRange filter, detectionPerAttribute from
False to True, an outlier-extreme indicator pair for each attribute is generated.
8. You could click each generated attribute to check the outlier and
extreme values for original attribute. Remove those attribute indicator
without outlier or extreme values with Remove button.
Unsupervised Instance Filter – RemoveWithValues: This filter removes instances
according to the values of an attribute.

1. After we find out which instances having outliers or extreme values, we could
remove those instances with outliers completely from the dataset. Choose
RemoveWithValues from the drop-down list of unsupervised instance Filter.
Then left-click the box of the filter. Since outlier attribute is indexed as 23 and
“yes” value is the last nominal value of this attribute, change the options of the
filter accordingly.
2. Then click Apply after confirming the changes. 70 instances are removed from
the dataset and Outlier attribute has no Yes values.

3. You could also remove instances according to the outlier-attribute-pair indicators


in the same way.

You might also like