Extended Isolation Forest Explained
Extended Isolation Forest Explained
        Abstract—We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended
        Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat
        maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We
        explain this problem in detail and demonstrate the mechanism by which it occurs visually. We then propose two different approaches
        for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out
        the bias. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach
        results in remedying the artifact seen in the anomaly score heat maps. We show that the robustness of the algorithm is much improved
        using this method by looking at the variance of scores of data points distributed along constant level sets. We report AUROC and
        AUPRC for our synthetic datasets, along with real-world benchmark datasets. We find no appreciable difference in the rate of
        convergence nor in computation time between the standard Isolation Forest and EIF.
1       INTRODUCTION
     the age of big data and high volume information, anom-                                   Those samples that travel deeper into the tree branches
I   N
   aly detection finds many areas of application, including
network security, financial data, medical data analysis, and
                                                                                              are less likely to be anomalous, while shorter branches are
                                                                                              indicative of anomaly. As such, the aggregated lengths of
discovery of celestial events from astronomical surveys,                                      the tree branches provide for a measure of anomaly or an
among many more. The need for reliable and efficient algo-                                    “anomaly score” for every given point. However, before
rithms is plentiful, and there are many techniques that have                                  using this anomaly score for processing of data, there are
been developed over the years to address this need includ-                                    issues with the standard Isolation Forest algorithm that
ing multivariate data [15] and more recently, streaming                                       need to be addressed.
data with need for updates on data with missing variables                                        It turns out that while the algorithm is computationally
[14]. For a survey in research in anomaly detection see [3],                                  efficient, it suffers from a bias that arises because of the
[6]. Among the different anomaly detection algorithms,                                        way the branching of the trees takes place. In this paper
Isolation Forest [9], [11] is one with unique capabilities.                                   we study this bias and discuss where it comes from and
It is a model free algorithm that is computationally efficient,                               what effects it has on the anomaly score of a given data
can easily be adapted for use with parallel computing                                         point. We further introduce an extension to the Isolation
paradigms [7], and has been proven to be very effective                                       Forest, named Extended Isolation Forest (EIF), that reme-
in detecting anomalies [16]. The main advantage of the                                        dies this shortcoming. While the basic idea is similar
algorithm is that it does not rely on building a profile for                                  to the Isolation Forest, the details of implementation are
data in an effort to find nonconforming samples. Rather,                                      modified to accommodate a more general approach.
it utilizes the fact that anomalous data are “few and                                            As we will see later, the proposed extension relies on the
different”. Most anomaly detection algorithms find anoma-                                     use of hyperplanes with random slopes (non-axis-parallel)
lies by understanding the distribution of their properties                                    for splitting the data in creating the binary search trees. The
and isolating them from the rest of normal data samples [4],                                  idea of using hyperplanes with random slopes in the con-
[5], [13]. In an Isolation Forest, data is sub-sampled, and                                   text of Isolation Forest with random choice of the slope is
processed in a tree structure based on random cuts in                                         mentioned in [10]. However, the main focus there is to find
the values of randomly selected features in the dataset.                                      optimal slicing planes in oder to detect clustered anomalies.
                                                                                              No further discussion on developing this method, or why it
 S. Hariri is with the Department of Mechanical Science and Engineering,                     might benefit the algorithm is presented. The methods for
  University of Illinois at Urbana-Champaign, Champaign, IL 61820 USA.                        choosing random slopes for hyperplanes, the effects of these
  E-mail: sahandha@gmail.com.                                                                 choices on the algorithm, and the consequences of this,
 M. Carrasco Kind is with the National Center for Supercomputing Applica-
  tions, University of Illinois at Urbana-Champaign, Champaign, IL 61820
                                                                                              especially in higher dimensional data, are also omitted.
  USA. E-mail: mcarras2@illinois.edu.                                                         Using oblique hyperplanes in tree based methods has also
 R. J. Brunner is with the Gies College of Business, University of Illinois at               been studied in random forests [12] but again, not in the
  Urbana-Champaign, Champaign, IL 61820 USA. E-mail: bigdog@illinois.edu.                     context of Isolation Forest. An approach to anomaly detec-
Manuscript received 3 Nov. 2018; revised 15 June 2019; accepted 2 Oct. 2019.                  tion which utilizes the concept of isolation is presented in
Date of publication 31 Oct. 2019; date of current version 5 Mar. 2021.                        [1], [2] where a nearest neighbor algorithm instead of a tree
(Corresponding author: Sahand Hariri.)
Recommended for acceptance by E. Keogh.                                                       based method is used for isolating data points. While this
Digital Object Identifier no. 10.1109/TKDE.2019.2947676                                       algorithm performs better than the Isolation Forest by
                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
1480                                                      IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 33, NO. 4, APRIL 2021
Fig. 1. Data and anomaly score map produced by Isolation Forest for two    Fig. 2. Data points and anomaly score maps of two clusters of normally
dimensional normally distributed points with zero mean and unity covari-   distributed points.
ance matrix. Darker areas indicate higher anomaly scores.
Fig. 3. Data with a sinusoidal structure and the anomaly score map.
                                                                      Fig. 4. Schematic representation of a single tree (a) and a forest (b)
this case, if we observed an anomalous data point that hap-           where each tree is a radial line from the center to the outer circle. Red
pened to lie near the origin, the algorithm would most likely         represents an anomaly while blue represents a nominal point.
categorize it as a nominal point. Given the nature of these
artifacts it is clear that these results cannot be fully trusted      is assigned based on the depth the point reaches in each tree
for more complex data point distributions where this effect           as shown in Fig. 4.
is enhanced. Not only does this increase the chances of false             Fig. 4a shows a single tree after training. The red line
positives, it also wrongly indicates a non-existent structure         depicts the trajectory of a single anomalous point traveling
in the data.                                                          down the tree, while the blue line shows that of a nominal
    As a third example we will look at a dataset with more            point. The anomalous point is isolated very quickly, but the
structure as shown in Fig. 3.                                         nominal point travels all the way to the maximum depth in
    In this case the data has an inherent structure, the sinu-        this case. In Fig. 4b we can see a full forest for an ensemble
soidal shape with Gaussian noise added on top. It is very             of 60 trees. Each radial line represents one tree, while the
important for the algorithm to detect this structure repre-           outer circle represents the maximum depth limit. The red
senting a more complex case. However, looking at the                  lines are trajectories that the single anomalous point has
anomaly score map generated by the standard Isolation For-            taken down each tree, and the blue ones show trajectories of
est, we can see that the algorithm performs very poorly.              a nominal point. As can be seen, on average, the blue lines
Essentially this data is treated as one large rectangular blob        achieve a much larger radius than the red ones. It is on the
with horizontal and vertical bands emanating parallel to the          basis of this idea that Isolation Forest is able to separate
coordinate axes as seen before. An anomalous data point               anomalies from nominal points.
that is in between two “hills” will get a very low score and              Using the two dimensional data from Fig. 1a as a refer-
be categorized as a nominal point.                                    ence, during the training phase, the algorithm will create a
    In this work, we will discuss why these undesirable fea-          tree by recursively picking a random dimension and com-
tures arise, and how we can fix them using an extension of            paring the value of any given point to a randomly selected
the Isolation Forest algorithm.                                       cutoff value for the selected dimension. The data points are
                                                                      then sent down the left or the right branch according to the
3    GENERALIZATION OF ISOLATION FOREST                               rules of the algorithm. By creating many such trees, we can
                                                                      use the average depths of the branches to assign anomaly
3.1 Overview                                                          scores. So any newly observed data point can traverse
The general algorithm for Isolation Forest [9], [11] starts           down each tree following such trained rules. The average
with the training of the data, which in this case is construc-        depth of the branches this data point traverses will be trans-
tion of the trees. Given a dataset of dimension N, the algo-          lated to an anomaly score using Equation (1).
rithm chooses a random sub-sample of data to construct a
binary tree. The branching process of the tree occurs by                                  sðx; nÞ ¼ 2EðhðxÞÞ=cðnÞ ;                       (1)
selecting a random dimension xi with i 2 f1; 2; . . . ; Ng of
the data (a single variable). It then selects a random value v        where EðhðxÞÞ is the mean value of the depths a single data
within the minimum and maximum values in that dimen-                  point, x, reaches in all trees. cðnÞ is the normalizing factor
sion. If a given data point possesses a value smaller than v          defined as the average depth in an unsuccessful search in a
for dimension xi , then that point is sent to the left branch,        Binary Search Tree (BST):
otherwise it is sent to the right branch. In this manner the
data on the current node of the tree is split in two. This pro-                    cðnÞ ¼ 2Hðn  1Þ  ð2ðn  1Þ=nÞ;                        (2)
cess of branching is performed recursively over the dataset
until a single point is isolated, or a predetermined depth            where HðiÞ is the harmonic number and can be estimated
limit is reached. The process begins again with a new ran-            by lnðiÞ þ 0:5772156649 (Euler’s constant) [11] and n is the
dom sub-sample to build another randomized tree. After                number of points used in the construction of trees.
building a large ensemble of trees, i.e., a forest, the training         Trees are assigned maximum depth limits as described in
is complete. During the scoring step, a new candidate data            [9]. In the case a point runs too deep in the tree, the branch-
point (or one chosen from the data used to create the trees)          ing process is stopped and the point is assigned the maxi-
is run through all the trees, and an ensemble anomaly score           mum depth value. Fig. 5 shows the branching process
1482                                                           IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 33, NO. 4, APRIL 2021
                                                                                 Fig. 6. Branch cuts generated by the standard Isolation Forest during the
                                                                                 training phase. In each step, a random value is selected from a random
                                                                                 feature (dimension). Then the training data points are determined to go
                                                                                 down the left or right branches of the tree based on what side of that line
Fig. 5. 5a shows the branching process for an anomalous data point (red          they reside.
point). The branching takes place until the point is isolated. In this case it
only took three random cuts to isolate the point. 5b shows the same
branching process for a nominal point (red point). Since the point is bur-       Fig. 3b. It is then evident that regions of similar anomalous
ied deep inside the data, it takes many cuts to isolate the point. In this       properties receive very different branching operations
case the depth limit of the tree was reached before the point was                throughout the training process.
completely isolated. The numbers on the lines represent the order of
branching process.                                                                  We propose two different ways of addressing this limita-
                                                                                 tion to provide a much more robust algorithm. In the first
                                                                                 method, the standard Isolation Forest algorithm is allowed
during the training phase for an anomaly and a nominal                           to run as is, but with a small modification. Each time it picks
example points using the standard Isolation Forest for the                       a sub-sample of the training data to create a tree, the train-
data depicted in Fig. 1. The numbers on the branching lines                      ing data has to undergo a random transformation (rotation
represent the order in which they were created. In the case                      in plane). In this case each tree has to carry the information
of the nominal point, the depth limit was reached since the                      about this transformation as it will be necessary in the scor-
point is very much at the center of the data and requires                        ing stage, increasing the bookkeeping. In the second method
many random cuts to be completely isolated. In this case the                     we modify and generalize the branching process by allow-
random cuts are all either vertical or horizontal. This is                       ing the branch cuts to occur in random directions with
because of how we define the branching criteria (picking                         respect the current data points on the tree node. Under this
one dimension at each step). This is also the reason for the                     perspective, the latter method is the truly general case,
strange behavior we observed in the anomaly score maps of                        while the former can be considered a special case of this
the previous section.                                                            general method. In fact we will show in the rest of the paper
   To build intuition for why this is, let’s consider one ran-                   that the standard Isolation Forest is also a special case of
domly selected, fully grown tree with all its branch rules                       this general method proposed here. In the following two
and criteria. Fig. 6 shows the branch cuts generated for a                       sub-sections we explore and present each method.
tree for the three examples we have introduced in Section 2.
   Note that in each step, we pick a random feature (dimen-
                                                                                 3.2 Rotated Trees
sion), xi , and a random value, v, for this feature. Naturally,
                                                                                 Before we present the completely general case, we discuss
the branch cuts are simply parallel to the coordinate axes.
                                                                                 rotation of the data in order to minimize the effect of the
But as we move down the branches of the tree and data is
                                                                                 rectangular slicing. In this approach the construction of
divided by these lines, the range of possible values for v
                                                                                 each individual tree takes place using the standard Isolation
decreases and so the lines tend to cluster where most of the
                                                                                 Forest, but before that happens, the sub-sample of the data
data points are concentrated. However, because of the con-
                                                                                 that is used to construct each one of the trees, is rotated in
straint that the branch cuts are only vertical and horizontal,
                                                                                 plane, by a random angle. As such, each tree is “twisted” in
regions that don’t necessarily contain many data points end
                                                                                 a unique way, differently than all the other trees, while
up with many branch cuts through them. In fact Fig. 7 shows
                                                                                 keeping the structure of the data intact. Once the training is
a typical distribution of the only possible branching lines
                                                                                 complete, and a we are computing the scores of the data
(hyperplanes in general) that could possibly be produced. In
                                                                                 points, the same transformation has to take place for each
Fig. 7a, a data point near (4,0) will be subject to many more
                                                                                 tree before the point is allowed to run down the tree
branching operations compared to a point that falls near
(3,3), despite the fact that they are both anomalies with
respect to the center of the dataset. As the number of branch-
ing operations is directly related to the anomaly score, these
two points might end up with very different scores.
   The same phenomenon happens in the other two cases,
but it is even worsened as these branch cuts intersect and
form “ghost” cluster regions, e.g., near (0,0) in Fig. 7b as
seen in previous anomaly score maps. Fig. 7c shows an
even more extreme case. The data repeats itself over one fea-
ture many times, and so the branch cuts in one direction are                     Fig. 7. A typical distribution of the only possible branch cuts. The biased
                                                                                 treatment of various regions in the domain of the data accumulate as
created in a much more biased fashion resulting in an                            many trees are generated, and as a result, regions of similar anomaly
inability to decipher the structure of the data as we saw in                     scores are subject to very different scoring possibilities.
HARIRI ET AL.: EXTENDED ISOLATION FOREST                                                                                                            1483
Fig. 11. Sample branch hyperplane for three dimensional data for each       Fig. 12. Comparison of the Standard Isolation Forest, Rotated Isolation
level of extension. The case of extension 0 11c is identical to the stan-   Forest, and Extended Isolation Forest for the case of the single blob.
dard Isolation Forest.
                                                                            extending the algorithm. Note that these plots are only pos-
change the test condition to reflect inequality (3). In addi-
                                                                            sible for two dimensional data, even though it is somewhat
tion, we have added line 6 which has to do with allowing
                                                                            manageable to visualize these maps and artifacts in 3-D. We
the extension level to change. With these changes, the algo-
                                                                            will use different metrics in order to analyze higher dimen-
rithm can be used as either the standard Isolation Forest, or
                                                                            sional data.
as the Extended Isolation Forest with any desired extension
                                                                               First consider the case of a single blob normally distrib-
level. This extension customization can be useful in cases,
                                                                            uted in 2-D space. Fig. 12 compares the anomaly score maps
for example, of a complex multidimensional data with low
                                                                            for all the mentioned methods.
cardinality in one dimension where that feature can be
                                                                               By comparing Figs. 12a and 12b we can already see a con-
ignored to reduce the introduction of possible selection bias.
                                                                            siderable difference in the anomaly score map. The rotation
   In Algorithm 3, we make the changes accordingly. We
                                                                            of the sub-sampled data for each tree averages out the arti-
now have to receive the normal and intercept point from
                                                                            facts and produces a more symmetric map as expected in
each tree, and use the appropriate test condition to set off the
                                                                            this example. Taking it a step further, we look at the score
recursion for figuring out path length (depth of each branch).
                                                                            map in Fig. 12c, which was obtained by the Extended Isola-
                                                                            tion Forest. We can see an anomaly score map as we might
Algorithm 3. PathLengthð~
                        x; T; eÞ                                            expect for this example. The low score artificial bands in the
Input: ~x - an instance, T - an iTree, e - current path length; to be       x and y direction are no longer present and the score
initialized to zero when first called                                       increases roughly monotonically in all directions as we move
Output: path length of ~    x                                               radially outward from the center in a quasi-symmetric way.
 1: if T is an external node then                                              Now we turn to the case of multiple blobs presented in
 2:      return e þ cðT:sizeÞfcð:Þ is defined in Equation (2)g              Fig. 13. As we saw for the single blob case and as shown in
 3: end if                                                                  Fig. 13a, the creation of branch cuts that were only parallel
 4: ~  n     T:Normal                                                       to the coordinate axes was causing two “ghost” regions in
 5: ~  p     T:Intercept                                                    the score map, in addition to the bands observed in the sin-
 6: if ð~ x ~ pÞ  ~
                    n  0 then                                              gle blob case.
 7:      return PathLengthð~   x; T:left; e þ 1Þ
                                                                               As shown in Figs. 13b and 13c, in both cases, namely, the
 8: else if ð~  x~  pÞ  ~
                          n > 0 then
                                                                            Rotated and Extended Isolation Forest, these “ghost”
 9:      return PathLengthð~   x; T:rigth; e þ 1Þ
                                                                            regions are completely gone. The two blobs are clearly
10: end if
                                                                            scored with low anomaly scores, as expected. The interest-
                                                                            ing feature to notice here is the higher anomaly score region
   As we can see the algorithm can easily be modified to take
                                                                            directly in between the two blobs. The Extended Isolation
into account all the necessary changes for this extension. We
                                                                            Forest algorithm is able to capture this detail quite well.
have made publicly available a Python implementation of
                                                                            This additional feature is important as this region can be
these algorithms1 accompanied by example Jupyter note-
                                                                            considered as close to nominal given the proximity to the
books to reproduce the figures in the text.
                                                                            blobs, but with higher score since it is far from the concen-
                                                                            trated distribution. The Standard Isolation Forest fails in
4    RESULTS AND DISCUSSION                                                 capturing this extra structure from the data.
Here we present the results of the Extended Isolation Forest                   As for the sinusoidal data, we observed before that the
compared to the standard Isolation Forest. We compare the                   standard Isolation Forest failed to detect the structure of the
anomaly score maps, variance plots and convergence of the
anomaly scores. We also report on the AUROC and AUPRC
values in order to quantify the comparison between the two
methods.
Fig. 14. Comparison of the standard Isolation Forest with rotated Isola-   Fig. 16. Mean and variance of the scores for data points at fixed distan-
tion Forest, and Extended Isolation Forest for the case of sinusoid.       ces from the center line.
data and treated it as one rectangular blob with very wide                 (close to the center of the blob), the variance is quite small in
rectangular bands.                                                         all three cases. All three methods are able to detect the cen-
   Fig. 14 shows a comparison of the results for this case.                ter of concentration of data quite well. This however, is not
The improvement in both cases over the Standard Isola-                     the goal for anomaly detection algorithms. After about 3s,
tion Forest is clear. The structure of the data is better pre-             the variance among the scores computed by the Extended
served, and the anomaly score map is representative of                     Isolation Forest, and the case of Isolation Forest with rotated
the data. In the case of the Extended Isolation Forest, the                trees, settle to a much smaller value compared to the stan-
score map tracks the sinusoidal data even more tightly                     dard Isolation Forest, which translates to a much more
than the rotated case.                                                     robust measurement. It is notable that this inconsistency
                                                                           occurs in regions of high anomaly, which is very important
                                                                           for an anomaly detection algorithm. In short, for data in
4.2 Variance of the Anomaly Scores
                                                                           regions of high anomaly likelihood, the standard Isolation
Besides looking at anomaly score maps, we can consider                     Forest produces anomaly scores which vary largely depend-
some other metrics in order to compare the Extended Isola-                 ing on alignment of these data points with the training data
tion Forest with the standard one. One such metric is look-                and the axes, rather than whether they are true anomalous
ing at the mean and variance of the anomaly scores of                      points or not. This increases the chances of false positives in
points distributed along roughly constant score lines for                  scoring stage.
various cases. The advantage of this metric is that we can                     We can do a similar study for the sine curve by consider-
test this for higher dimensional data.                                     ing lines at a similar distances above and below where the
   Consider our randomly distributed data in two dimen-                    data points lie as shown in Fig. 16a. We assume data lying
sions shown in Fig. 15a. The blue dots represent the train-                on each line to be scored more or less the same. This is how-
ing data, used to create the forest in each case. The gray                 ever not quite true. If the data stretched to infinity in the
dots form concentric circles. Since our data is normally                   horizontal direction, or we had periodic boundary condi-
distributed, anomaly scores along each circle should                       tions in that direction, this would indeed be true. However,
remain more or less a constant. The red lines represent                    since our data is bounded in the horizontal direction, there
the circles at 1s, 2s, and 3s of the original normal distri-               will be variations in the scores along each line as in the map
bution from which the training data is sampled. We will                    shown in Fig. 14c.
run the data from the gray circles through our trained                         Nonetheless we performed the analysis as shown in
Isolation Forest for the standard, the rotated and the                     Fig. 16. Similarly to the previous plots, the three red lines on
extended cases, and look at the variation of the score as a                each side of the center line correspond to 1s, 2s, and 3s
function of distance from the center.                                      from the Gaussian noise added to the sine curve. The gray
   From Fig. 15b we can see that the mean score varies the                 lines represent data points that lie at constant distance from
same way among the three methods. It increases quickly                     the center of the data. These are points that are used in the
within 3s from the center and then keeps monotonically                     scoring stage. The blue data points are used for training.
increasing. At higher values the rate of increase slows down               Fig. 16b shows the change in the mean score as we move
until it converges far away from the center where the blob                 out on both sides in the y direction. We observe that in this
resembles a single point.                                                  case, the mean scores behave differently. The mean score
   As far as the variance of the scores goes, looking at                   for the standard Isolation forest reaches a constant value
Figs. 12 and 15c, we can see for very small anomaly scores                 very quickly. This is consistent with what we observed in
                                                                           the score maps, Fig. 14a. We saw that the standard Isolation
                                                                           Forest was unable to capture the structure present in this
                                                                           dataset, and treated it as a rectangular blob. The rapid rise
                                                                           of the anomaly score for this case represents moving
                                                                           through the boundary of this rectangular region into a high
                                                                           anomaly score area. Outside this region, all data points look
                                                                           the same to the standard Isolation Forest. This rapid rise is
                                                                           undesirable. In addition to not capturing the structure of
                                                                           the data, these scores are much more sensitive to choosing
Fig. 15. Mean and variance of the scores for the datapoints located        an anomaly threshold value which results in reducing
radially at fixed distances from the center.                               AUCROC, as we will see below. In the other two cases, we
HARIRI ET AL.: EXTENDED ISOLATION FOREST                                                                                                 1487
                                                                                                    TABLE 1
                                                                           AUC Values for Both ROC and PRC for Example Datasets
                                                                          Using Standard Isolation Forest and Extended Isolation Forest
observe a similar trend as before where the mean is con-                  4.3 AUC Comparison
verging to a steady score value, but much more slowly,                    In this section we report AUC values for both ROC and PRC
making it less sensitive to choosing threshold values.                    and compare the two methods. Initially, we produce these
   Fig. 16c shows the variance of the scores obtained. We                 results for the examples provided earlier, since these exam-
notice that, as before, the variance is much higher for the               ples are designed to highlight the specific shortcoming of
standard Isolation Forest, and lower for the extended Isola-              Isolation Forest studied in this paper. We then report the
tion Forest. After 3s the variance for the standard case                  AUC for ROC and PRC for some real world benchmark
drops rapidly to a constant value but higher than that                    datasets of varying sizes and dimensions.
obtained by EIF, as the mean score reaches a steady value.                   In order to obtain the AUC values for the example data-
The constant value here is a clear indication that the stan-              sets, we first train both algorithms on 2000 data points in
dard Isolation Forest is unable to capture any useful infor-              each case distributed as seen previously. We then add 200
mation outside the rectangular region discussed before. The               anomalous points, score the data, and compute AUC’s. The
EIF again provides a more robust scoring prediction.                      results are summarized in Table 1.
   In addition to the two examples above, we can perform                     In all cases the improvement is obvious, especially in the
the same analysis on higher dimensional data. We choose                   case of the double blob, where we have “ghost” regions,
3-D and 4-D blobs of normally distributed data in space                   and the sinusoid, where we have a high variability in the
around the origin with unit variance in all directions, simi-             structure of the dataset. This highlights the importance of
larly to the 2-D case. Recall that as soon as we move beyond              understanding the problem that arises due to using axis-
the two dimensional case, we have more levels of extension                parallel hyperplanes.
that we can consider with regards to generating the plane                    We perform the same analysis on some real world bench-
cuts. These are represented by Exn as discussed above. We                 mark datasets. Table 2 lists the datasets used as well as their
do not include the rotated case in the following analysis.                properties. The results are summarized in Table 3.
   As before and not surprisingly, we can see from Figs. 17                  We can see the improved AUC values in the case of
and 18 that the mean score increases with increasing radius               Extended Isolation Forest. Even though in some cases the
very similarly in all cases, which was also observed in the               improvement may seem modest, in others, there is an
2-D blob case. However, again we see in the region beyond                 appreciable difference. Naturally, the axis-parallel limita-
3s, which is where the anomalies lie, the Extended Isolation              tion has more effects on some datasets than other. Neverthe-
Forest produces much lower values in the score variance.                  less, we can see that the EIF in general achieves better
We also observe that for each case, the variance decreases as             results than the standard Isolation Forest.
the extension level increases. In other words, the Extended
Isolation Forest, i.e., the fully general case (when the planes           4.4 Convergence of Anomaly Scores
subdividing the data have no constrains and the normal can                In a final comparison, we would like to find out how efficient
                                                                          the Extended Isolation Forest is compared to the standard
                                                                          case. For this purpose we consider convergence plots for the
                                                                          anomaly score as a function of forest size (the number of
                                                                          trees). We perform this for the 2-D, 3-D, and 4-D blobs. In the
                                                                          case of 2-D we also include the method of tree rotations,
                                                                                                    TABLE 2
                                                                                             Table of Data Properties
                          TABLE 3
AUC Values for Both ROC and PRC for Benchmark Datasets
Using Standard Isolation Forest and Extended Isolation Forest
                                                                         Fig. 20. Convergence plots for the 3-D blob for the standard and fully
                                                                         extended cases.
Fig. 19. Convergence plots for the 2-D Blob case and the three methods
described in the text.
benchmark data. The EIF performed consistently better than                                             Sahand Hariri received the bachelor’s and
                                                                                                       master’s degrees in mechanical engineering from
the standard Isolation Forest in all cases considered.                                                 Temple University. He then joined Wolfram Alpha
   The results of this extension are more reliable and robust                                          LLC, where he was a research developer for two
anomaly scores, and in some cases more accurate detection                                              and a half years. After that, while a graduate
of structure of a given dataset. We additionally saw these                                             student at the University of Illinois in theoretical
                                                                                                       and applied mechanics, He worked at the National
results can be achieved without sacrificing computational                                              Center for Supercomputing Applications through
efficiency. We have made a Python implementation of the                                                the summers of 2016, 2017, and 2018 where he
Extended Isolation Forest algorithm publicly available2,                                               met and started his collaboration with Matias Car-
                                                                                                       rasco Kind. His interests include applications of
accompanied by example Jupyter notebooks to reproduce                           computational methods in sciences, more specifically, dynamical and com-
the figures in this paper.                                                      plex systems. Currently, he works at PSI Metals GmbH in Berlin, Germany.
2. https://github.com/sahandha/eif